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cepted because of lack of space. Several pages 
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published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
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their final acceptance. 


An author who wishes to submit a Brief 
Report: 


1. Sends the Brief Report, limited to one printed 
page and prepared according to the specifications 
given below. 

2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 


4. Agrees not to submit the full report to another 
journal of general circulation. 


Specifications 
Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 
To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
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including all matter except the title and the 
author’s lines, must not exceed 85 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style of 
the 1957 revision of the APA Publication 
Manual. Headings, tables, and references are 
avoided or, if essential, must be counted in 
the 85 lines. Each Brief Report must be ac- 
companied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 85-line quota: 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. ——, re- 
mitting $—— for microfilm or $—— for photo- 
copies. 


Extended report. Because the extended re- 
port is intended for photoduplication, and is 
not copy to be sent to a printer, its style 
should differ in several ways from that of 
other manuscripts: (@) The extended report 
should be typed with single spacing for 
economy in duplication. (6) Tables and fig- 
ures should be placed adjacent to the text 
which refers to them. A caption should be 
typed below each figure. (c) Footnotes should 
be typed at the bottom of the page on which 
reference is made to them. In other respects, 
the full report is prepared in the style speci- 
fied by the Publication Manual. 
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THE FACTORIAL STRUCTURE OF THE WISC 
AT AGES 7-6, 10-6, AND 13-6' 


JACOB COHEN 


Franklin D. Roosevelt Veterans Administration Hospital and 
New York University 


The present article continues a systematic 
investigation of the intellectual domain as 
sampled by the Wechsler scales (Wechsler: 
1949, 1955, 1958). Previous articles have de- 
scribed the factorial structure of the Wechs- 
ler-Bellevue (W-B) for male veteran neuro- 
psychiatric patients (Cohen: 1952a, 1952b) 
and of the Wechsler Adult Intelligence Scale 
(WAIS) for standardization samples of adults 
over a wide age range (Cohen: 1957a, 1957b). 
These and other factor-analytic studies are 
summarized by Wechsler (1958, Ch. 8). The 
present study is concerned with the data pro- 
vided by the standardization of the Wechsler 
Intelligence Scale for Children (WISC), a 
test of form and content similar to those 
above, on standardization samples of children 
at age levels 7-6, 10-6, and 13-6. 

This study, then, provides an extension 
downward of the age range to age 7-6. Taken 
by itself, it provides some insight into the 
process of intellectual maturation via the com- 
parative analysis of the factorial structures 
for the three age groups, and the accompany- 
ing by-product of a rationale for the subtests. 
By comparing its results with those of the 
previous studies, it makes possible the pres- 
entation of a broad picture of intellectual or- 
ganization on Wechsler scales over widely 
varying conditions of age and psychopa- 
thology. 

Method 
Subjects 


The data analyzed were those of the three 
age levels of the total standardization samples 


1 From the Psychiatric Evaluation Project of the 
Psychology Service, Veterans Administration Hos- 
pital, Montrose, New York. The author gratefully 
acknowledges the cooperation of the management of 
the hospital, and is especially indebted to Catherine 
S. Henderson for typing the manuscript. 


reported in the manual (Wechsler, 1949). The 
standardization sample was stratified at each 
age by geographic area, urban-rural residence, 
and parental occupation to provide a fairly 
close match with the child population dis- 
tribution of the 1940 United States census. 
The N of each age group was 200, made up 
equally of boys and girls, all of whom were 
within 14 months of their midyear (Wechsler, 
1949, pp. 7-8). 


Analysis 


The matrices of intercorrelations of the 
three age groups given in the manual (Wechs- 
ler, 1949, pp. 10-12) were separately sub- 
jected to the following analysis: * 

1. Thurstone’s complete centroid method 
was used (Thurstone, 1947, pp. 161-170), 
with communalities estimated by his Equa- 
tion 15 (Thurstone, 1947, pp. 300, 318). Since 
the absolute discrepancies between estimated 
and computed communalities were small (me- 
dian = .02), the factorizations were not reit- 
erated. The decision as to the number of fac- 
tors to be accepted for rotation was based on 
the statistical criteria of Saunders (Cattell, 
1952, pp. 300-301), McNemar (1942a) and 
Burt (Cureton, 1955), and on the results 
of the previous factorization of the WAIS 
(Cohen, 1957b). The statistical criteria were 
inconsistent with regard to the acceptance of 
the fifth centroid, but since the WAIS had 
yielded five factors in three adult age groups, 
and since the extraction of residual (i.e., 
“error’) factors results in their vanishing in 


2To save printing costs, tables giving centroid 
loadings and communalities, transformation matrices 
and intercorrelations among primaries have been de- 
posited with the American Documentation Institute. 
Order Document No. 5923, remitting $1.25 for micro- 
film or $1.25 for photocopies. 
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the process of rotation, five centroids were ex- 
tracted in each group. The “reality” of the 
fifth factor is attested to by its consistency 
among the three children’s groups and with 
the adult groups on the WAIS (Cohen, 
1957b). 

2. Rotation was essentially blind and inde- 
pendent in the three groups, and was per- 
formed by Thurstone’s method of two-dimen- 
sional sections (Thurstone, 1947, pp. 194— 
216). The rotation criteria were oblique sim- 
ple structure and a positive manifold with 
maximization of the number of variables in a 
+.05 and +.10 hyperplane (Cattell, 1952, pp. 
235-236). 

3. The intercorrelations among the oblique 
primary factors were determined and sub- 
jected to a second-order general factor analy- 
sis (Thurstone, pp. 273-277, 421-434), as a 
product of which the correlations of the sub- 
tests with this general factor were obtained. 
This analysis made possible the determination 
of the proportion of total and of true (non- 
error) variance attributable to the second- 
order general factor and comparisons with 
these proportions found for adults on the 
WAIS (Cohen, 1957b). 


Results and Discussion 


Five oblique primary factors were found in 
each age group. Factor loadings in excess of 
.20 are accepted as significant for the pur- 
pose of factor interpretation.’ To facilitate 
comparisons between age groups in Table 1, 
loadings between .20 and .39 are followed by 
an asterisk (*), and loadings of .40 and higher 
by a double asterisk (**). The factors were 
found to be very similar in the three age 
groups, a finding to which we have grown 
accustomed (Cohen: 1952b, 1957b). Table | 


8 The use of the .20 significance criterion in place 
of the more conventional (and arbitrary) .30 is justi- 
fied as follows: This investigation can be viewed as 
a set of three replications of a factor analysis (with 
the WAIS study [Cohen, 1957b] providing four 
more); therefore, consistency between groups is an 
effective guard against the acceptance of nonsignifi- 
cant loadings as significant, and superior to the rule- 
of-thumb .30 criterion, which would lead to spuri- 
ously inconsistent results between groups. Further 
justification of this criterion is provided by the fact 
that of the 180 factor loadings, only six were be- 
tween .10, the upper limit of the hyperplane, and .20. 
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is accordingly organized by factor. Decimal 
points are omitted in the factor tables through- 
out. 


Factor A: Verbal Comprehension I 


The subtests do not load this factor in the 
three groups of children with quite the neat 
consistency which obtained for the four adult 
groups on the WAIS (Cohen, 1957b, p. 284). 
All three children’s groups load Information 
and Similarities, Arithmetic fails to load in 
the oldest (13-6 years old) group, Compre- 
hension loads only in that group, while Vo- 
cabulary loads in the youngest and oldest 
groups—in the latter heavily (see Table 1-A). 
These tests are all verbal, and the factor is 
named Verbal Comprehension I, the roman 
numeral being used to distinguish it from 
Factor D, which also involves verbal subtests. 

Since there are only five verbal subtests in 
all, no confident interpretation of Factor A 
which distinguishes it from Factor D (named 
Verbal Comprehension II) can be made. How- 
ever, an hypothesis consistent with the ob- 
served loadings on the two factors can be 
offered. Factor A seems to reflect that aspect 
of verbally retained knowledge impressed by 
formal education: facts (Information) and 
verbal categorizing (Similarities) at all three 
ages, and number manipulation (Arithmetic) 
at ages 7-6 and 10-6. The pivotal subtest is 
Information, which does not appear on Fac- 
tor D (or any other). Factor D, on the other 
hand, seems to stress judgment and is dis- 
cussed below. The differentiation between the 
two factors should not be overstressed in the 
light of the correlations between them of .78, 
.82, and .43 in ascending order of age in the 
three groups. (Indeed, when factor scores 
come to be discussed below, it will be seen 
that these two factors are combined into a 
single verbal factor.) 

By age 13-6, the composition of Factor A 
is “stabilized” in that it is loaded by the same 
four subtests as appear on it throughout 
the adult years on the WAIS: Information, 
Comprehension, Similarities, and Vocabulary 
(Cohen, 1957b). Factor A, then, is identified 
as the well-established verbal comprehension 
factor found in all previous factor-analytic 
studies of the W-B and WAIS with normals 
at different age levels (cf. Balinsky, 1941; 
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Coding 


Birren, 1952; Cohen, 1957b; Wechsler, 1958) 
as well as with psychiatric patients (cf. Cohen, 
1952a; Cohen, 1952b; Hover, 1950; Simkin, 
1950), which in turn is the same factor found 
in all general investigations of the intellec- 
tual domain (cf. Thomson, 1951). 


Factor B: Perceptual Organization 


This factor is readily identified with Fac- 
tor B in the adult group. The same two sub- 
tests, Block Design and Object Assembly, 
load it consistently and exclusively in the 
three children’s age groups (see Table 1-B) 
as they did in the four adult age groups 
(Cohen, 1957b). The affinity to this factor of 
Picture Arrangement seen here at age 7—6 
is also found in some adult groups (Cohen, 
1957b, p. 285). Picture Completion loads 
here at ages 10-6 and 13-6, and Mazes, 
which has no match on either the WAIS or 
W-B, appefrs at ages and 10-6. 

This factor has appeared consistently in 
previous factor-analytic studies of Wechsler 
scales, and although it has been variously 
named, there is no mistaking its identity (see 
references in Cohen, 1957b, and Wechsler, 
1958). The tests which load it are all non- 
verbal and require the interpretation and/or 
organization of visually perceived materials 
against a time limit. It is thus given the same 
name as in the WAIS study, Perceptual Or- 
ganization (Cohen, 1957b). 


Factor C: Freedom from Distractibility 


In Table 1-C, the consistently and exclu- 
sively loading subtest is Digit Span, while 
Mazes loads at 10-6 and 13-6, and Picture 
Arrangement, Object Assembly, and Arith- 
metic each load in a different group. At 13-6, 
the two significantly loading subtests, Arith- 
metic and Digit Span, are the same ones as 
appear throughout the adult range (Cohen, 
1957b). 

This factor has appeared repeatedly in 
previous factorial studies of the Wechsler sub- 
tests, and has been variously interpreted as 
memory (Balinsky, 1941; Birren, 1952; 
Cohen: 1957a, 1957b), freedom from dis- 
tractibility (Cohen: 1952a, 1952b), attention- 
concentration (Hover, 1950), and concentra- 
tion-speed (Simkin, 1950). In the WAIS 
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study (Cohen, 1957b), it was pointed out 
that these concepts are not as diverse as they 
may seem at first glance, since rote memory 
requires as a precondition the ability to re- 
main undistracted (to attend or concentrate), 
and C was interpreted to be a memory factor. 
In the present study, it is felt necessary to 
revert to the original interpretation of this 
factor as one of freedom from distractibility 
(Cohen: 1952a, 1952b) primarily due to the 
loadings of subtests which clearly do not in- 
volve memory (Mazes, Picture Arrangement, 
Object Assembly), but which it seems reason- 
able to suppose are quite vulnerable to the 
effects of distractibility. To make the point 
explicit, Factor C in Wechsler scales (WISC, 
WAIS, o: W-B) is primarily a Freedom from 
Distractibility factor; its interpretation as a 
memory factor in the WAIS study (Cohen, 
1957b) and others (Balinsky, 1941; Birren, 
1952) is in error. 


Factor D: Verbal Comprehension IT 


Comprehension and Picture Completion load 
at all three age levels, Vocabulary at 7-6 and 
10-6, and Similarities at 13-6 (see Table 
1-D). This factor seems to change in a sys- 
tematic way with increase in age: Vocabulary 
drops out by age 13-6, Comprehension drops 
from the highest loading in the table at age 
7-6 to a barely acceptable one at age 13-6, 
and Similarities appears weakly at age 13-6; 
only Picture Completion loads at the same 
(weak) level throughout. The identification 
of this factor with Factor D in the adult 
study is readily made by the apparent con- 
tinuation of these subtest trends for the 18-19 
age group on the WAIS (Cohen, 1957b, p. 
286): Picture Completion continues to load 
(and at the same level), the diminishing Com- 
prehension subtest drops to a zero loading, 
and Similarities continues its weak loading. 

When Factor D in all four adult groups 
was considered (Cohen, 1957b), it was seen 
to be almost solely a matter of Picture Com- 
pletion: only this subtest loaded in all four 
groups, and the other three subtests which 
ever load the factor do so in three different 
groups and with barely acceptable loadings 
(.20 to .22). The factor was therefore ac- 
cepted as a quasi-specific in the WAIS study 
and not interpreted. When the results of the 
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present investigation are added for considera- 
tion, a tentative hypothesis about this factor 
suggests itself, namely, that it reflects the ap- 
plication of judgment to situations following 
some implicit verbal manipulation. The sug- 
gested differentiation from Factor A is that 
the latter represents formally learned verbal 
comprehension, while Factor D is the applica- 
tion of verbal skills to new situations. This 
differentiation is consonant with the findings; 
e.g., Comprehension declines as a measure of 
judgment as formally learned “right” answers 
are acquired. 

Since the distinction between these two fac- 
tors is not sharp, either in conception or in 
the light of the already noted substantial cor- 
relations between them, and since such dis- 
tinction which has been made is based on 
ad hoc reasoning, Factor D is called Verbal 
Comprehension II, and neither factor is more 
specifically identified. This more conservative 
position simply holds that two highly cor- 
related factors were found loaded primarily 
by overlapping verbal subtests. 


Factor E 


As was the case in the adult study (Cohen, 
1957b, pp. 286-287), the only consistently 
loading subtest is Coding, which is equivalent 
to Digit Symbol (see Table 1-E). Picture Ar- 
rangement loads this factor more substan- 
tially than does Coding in the older two 
groups, but fails to load it at all at age 7-6. 
In the adults one also finds some affinity for 
Factor E in Picture Arrangement (ages 45— 
54), but again the only consistently loading 
subtest is Digit Symbol. 

No psychological interpretation of Factor E 
is offered here, nor was any offered in the 
WAIS study. It is not an important factor in 
terms of the amount of variance it accounts 
for in any of the children or adult groups. 
But it is “real” in the sense that its consistent 
appearance in both studies precludes its at- 
tribution to sampling fluctuations in the sub- 
test intercorrelations. 


Second-Order General Factor 


As already noted, the rotations were oblique, 
and the result was a substantial degree of cor- 
relation among the primary factors in all three 
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Table 2 


Correlations With G of WISC Subtests and Primary 
Factors for the Three Age Groups 


Subtest 


Information 
Comprehension 
Arithmetic 
Similarities 
Vocabulary 

Digit Span 

Picture Completion 
Picture Arrangement 
Block Design 
Object Assembly 
Coding 

Mazes 


Primary Factor 


groups.* When these matrices of factor inter- 
correlations are factored in turn, they are 
found to be substantially (but not com- 
pletely) accounted for by a single (second- 
order) general factor. Table 2 gives the cor- 
relation of the subtests and the primary fac- 
tors with this factor, G, which is interpreted 
as present general intellectual ability. The G 
correlations of the subtests are moderate 
(median over all age groups = .58) and of 
the primary factors quite high (over-all me- 
dian .84). The degree of correlation with G 
of the subtests is not quite so high for chil- 
dren on the WISC as it was found to be for 
adults on the WAIS (for further discussion 
on this point, see below) ; still, the magnitudes 
are substantial and suggest that the WISC 
Full Scale IQ, which is a linear function of 
an equally weighted sum of subtest weighted 
scores, is loaded quite strongly with and, 
hence, is a good measure of G. This is, in 


*The median and range of primary factor inter- 
correlations for the three groups were as follows: 
Age 7-6, Mdn 56, R = 39-.78; age 10-6, Mdn 
1, R= 18-83; age 13-6, Mdn = .56, R = .10-.83. 
All primary factor intercorrelations are given in the 
ADI material (see Footnote 2). 
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fact, the case. When the correlation of Full 
Scale IQ with G is determined, using the cor- 
relation of sums formula, it is found to be .92, 
.94, and .95 for the three groups in order of 
increasing age. The “Verbal IQ” is an equally 
good measure of G (.93, .94, and .94), while 
the “Performance IQ” is a relatively poor one 
(.78, .82, and .81). 

As has been found in the past work with 
the W-B (Cohen: 1952a, 1952b; Wechsler, 
1958) and the WAIS (Cohen: 1957a, 1957b; 
Wechsler, 1958), the “essentially verbal” 
tests, particularly Vocabulary and Informa- 
tion, are consistently the best measures of G 
over the age range considered; the poorest 
are Picture Completion, Mazes, and Object 
Assembly. Among the primary factors, except 
for the anomalously low correlation with D at 
ages 13-6, it is Factor B which is consistently 
the poorest measure of G, while the others 
are, on the average, about equally and highly 
related. These results parallel the G correla- 
tions of Verbal and Performance IQs noted 
above. 

In the WAIS study, noteworthy shifts in 
the correlations of subtests and major factors 
with G occurred as a function of age in the 
60-75+ age group (Cohen: 1957a, 1957b). 
Over the age range from 7-6 to 13-6 of the 
present study, no such shifts occur; the sub- 
tests and factors maintain fairly consistent 
levels of correlation in the three children’s 
age groups as they did for the three younger 
adult age groups, i.e., 18-19, 25-34, and 
45-54. 


Subtest Specificity 

A body of doctrine has come down in the 
clinical use of Wechsler scales, which involves 
a rationale in which specific intellective and 
psychodynamic trait-measurement functions 
are assigned to each of the subtests (e.g., 
Rapaport, 1945). This has been used to pro- 
vide a theoretical basis for both research in 
pattern analysis and in clinical psychodiag- 
nosis. Implicit in this rationale lies the as- 
sumption that a substantial part of a sub- 
test’s variance is associated with these specific 
measurement functions. In previous work 
with the W-B for neuropsychiatric patients 
(Cohen: 1952a, 1952b) and with the WAIS 
for the standardization data (Cohen: 1957a, 


P Age 
oe 61 79 78 
59 74 63 
61 74 64 
58 67 72 
7 69 86 79 
. 58 52 46 
47 42 41 
a4 60 67 53 
48 58 58 
47 43 48 
38 49 55 
37 49 49 
A 83 85 91 
Re B 53 29 64 
87 82 90 
1 D 88 93 33 
a E 69 91 87 
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1957b), the basis for this doctrine was chal- 
lenged on grounds of small specific variance 
for the subtests. When the communality of a 
subtest (proportion of variance shared with 
other subtests) is subtracted from its internal 
consistency reliability coefficient (proportion 
of true, i.e., nonerror variance), the remainder 
constitutes the subtest’s specificity, the pro- 
portion of its variance which is both nonerror 
and distinctive to it. This component of the 
subtest’s variance, and only this component, 
can carry the variance necessitated by an hy- 
pothesized one-to-one correspondence of sub- 
test to trait in any clinical rationale. 

These values for the age groups on the 
WISC are presented in the Subtest-Specificity 
section of Table 4. The over-all median of 
these values is .18, and excepting only Mazes 
(the only subtest not represented in the adult 
scales), no subtest at any age has as much as 
one-third of its variance attributable to speci- 
ficity. It is apparent that these specificities are 
quite inadequate to serve as a basis for a sub- 
test-specific rationale. 

Thus, it is true for the WISC (as it is for 
the WAIS in normal adults and the W-B in 
patients) that a subtest’s measurement func- 
tion is most meaningfully and completely de- 
scribed in terms of G and the primary abili- 
ties, which account for the bulk of its reliable 
variance, and not in terms of its small (essen- 
tially uninterpretable and probably complex) 
specificity. Adherents of the “clinical” ra- 
tionales can find no support in the factor- 
analytic studies of Wechsler scales. A detailed 
rationale for the subtests, based on the present 
factor analyses, is presented in a later section. 


Age, G, and Intellectual Organization 


The availability of complete analyses of the 
same set of intelligence subtests for repre- 
sentative subjects over a wide age range makes 
possible for the first time direct scrutiny of 
changes in intellectual organization concomi- 
tant with age. In the following discussion, let 
it be understood that generalizations about 
intellectual organization carry along the quali- 
fication “insofar as Wechsler subtests are 
broadly representative of the complex of 
functions considered relevant to intelligence.” 
Despite this qualification, it seems to the au- 
thor unlikely that another collection of sub- 
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tests drawn from within the same framework 
would lead to substantially different conclu- 
sions. 

It has already been noted that the same 
correlated primary factors appear at each of 
the age levels 7-6, 10-6, and 13-6 of the 
WISC standardization samples, that these are 
substantially correlated, and thereby give rise 
to G. The analysis which was performed 
makes possible the determination of the 
amount and proportion of variance attribut- 
able to each of various identifiable sources: 


1. G—the general second-order factor, ie., present 
general intellectual functioning; that part of the vari- 
ance shared by all the subtests. 

2. c—the communality (h*), ie., that part of the 
variance common to two or more of the subtests. 
This includes all influences leading to intercorrela- 
tions among subtests (including G). 

3. s—that part of the variance which is specific to 
a given subtest in a battery and consistently meas- 
ured by it (see preceding section, Subtest Specificity). 

4. e—random error of measurement variance, ie., 
the subtest’s “unreliability.” 


Table 3 represents the average proportions 
of the subtests attributable to the above 
sources at each age for the three children’s 
groups on the WISC, as well as comparable 


Table 3 


Proportions of Total Variance Attributable to Various 
Sources for the Three Children’s Age Groups 
on the WISC and the Three Adult 
Age Groups on the WAIS* 


Source 
\ge Group G c s e 
Children 
7h 30 48 19 33 
104 40 57 18 25 
134 36 54 19 26 
Mean 35 53 18 28 
Adults 
18-19 53 67 14 19 
25-34 50 64 16 20 
45-54 54 67 15 18 
Mean 52 66 15 19 


* Rows do not sum to unity because of overlap—G is part 
of « However, c, s, and e are mutually exclusive and ex 
haustive, i.e., these three sources do sum to unity in each row 
within rounding errors). 
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data on three adult groups on the WAIS 
(Cohen, 1957b). G accounts for .30—.40 of 
the total variance in the children’s groups, the 
mean over the three groups being .35, and no 
monotonic age trend is noted. 

Inferences about the extent of generality of 
intellectual functioning cannot be restricted 
solely to a consideration of G, since some 
subjectivity occurs ia the course of rotating 
to oblique simple structure. When, alterna- 
tively, one scrutinizes the communality (c, 
Table 3), which is determined objectively and 
fairly directly from the subtest intercorrela- 
tions, a picture similar to that of G emerges 
for the three children’s age groups. Com- 
munal variance on the average accounts for 
about half of the total (.53), with neither 
large differences nor a monotonic trend with 
age in the children’s groups. 

With respect to specificity and error, the 
former remains constant, while the latter is 
high in the 7}-year-olds as compared with the 
two older groups. This simply reflects the fact 
that the reliability coefficients of the subtests 
are lower at this age, so that measurement 
error accounts for about one-third of the vari- 
ance in contrast with one-fourth for the older 
children. 

Summarizing for the three groups of chil- 
dren, about half of the total subtest variance 
is shared in common and two-thirds of this 
communality (.35 of the total) is a function 
of G. Only .18 of the total variance reflects 
subtest specificity, while a considerably higher 
proportion (.28) lies in errors of measurement. 

The picture of pregeriatric adult intellectual 
organization on the WAIS is strikingly differ- 
ent (lower half of Table 3). We can put aside 
immediately the differences among the three 
adult age groups—they are very small and can 
safely be ignored. Turning, then, to a consid- 
eration of the mean values, we find that G ac- 
counts for fully .52 of the total variance, an 
amount half again as large as that for the 
children. For the adults, about two-thirds 


5 Note that the statements which characterize adult 
intellectual functioning refer to adult ages up to age 
54. The substantially greater generality compared 
with children no longer holds for the 60-75+ age 
group. In this geriatric group, G accounts for only 
12 of the total WAIS variance, an 
slightly greater than that 
1957b, p. 289). 


amount only 
of the children (Cohen, 


(.66) of the total subtest variance is shared 
among the subtests in contrast to the .53 fig- 
ure for the children. G variance accounts for 
four-fifths (.52/.66 = .79) of the communal- 
ity for the adults and for only two-thirds 
(.35/.53 = .66) of the communality for the 
children. The amount oi specificity is slightly 
smaller for adults, and the amount of error 
variance substantially smaller. 

The conclusion that adults show both higher 
generality and communality of intellectual 
functioning than children suggests itself with 
some force. This conclusion is persuasive even 
in the absence of sampling error formulae in 
the light of the gap of .10 between the highest 
G proportion in the children’s groups (.40 at 
age 10-6) and the lowest G proportion in the 
adult groups (.50 at age 25-34).° 

This conclusion is in flat disagreement with 
what is generally held to be the case in the 
literature. The doctrine of intellectual differ- 
entiation with age is most explicitly stated by 
Garrett (1946): “Abstract or symbol intelli- 
gence changes in its organization as age in- 
creases from a fairly unified and general abil- 
ity to a loosely organized group of abilities or 
factors” (p. 373). Operationally, this would 
be shown by a decrease between childhood 
and adulthood of general factor variance, 
communality variance, mean intercorrelation, 
or first centroid factor variance. 

Garrett (1946), Anastasi (1948), and Burt 
(1954) review the evidence for this differen- 
tiation hypothesis. It is based on some studies 
which show a reduction in G with age during 
childhood (although the most extensive of 
such studies, that of McNemar on the Binet 
| McNemar, 1942b], does not), and on dem- 
onstrations of more G in elementary school 
children than in college students. The latter 
is readily explained as due to the selection 
for intelligence at the college level (which, 


because of the narrowing of the “range of 


6 These conclusions are not changed when the 
“true” (i.e., nonerror) variance is analyzed to allow 
for the lower subtest reliabilities of the children: on 
the average, (a) .49 of the true variance of children’s 
scores is associated with G compared to .64 for 
adults; (b) the communality constitutes .74 of the 
true variance for children and 82 for adults; (c) 
complementarily, there is substantially more speci 
ficity in intellectual children (.26) 
than in adults (.18) 


functioning in 
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Table 4 


Proportions of Total Variance Attributable to Various Sources for the WISC Subtests 
and IQs at Ages 7-6, 10-6, and 13-6 


General Specificity Specificity Error 
Subtest Age: 7-6 10-6 13-6 7-6 10-6 13-6 7-6 10-6 13-6 7-6 10-6 13-6 
Inf. 37. @ 61 2 613 15 06 O05 06 34. 
Arith. 12 11 11 37 16 23 
Voc. 48 74 62 11 10 17 3s 
D. Sp. 11 18 10 40 41 50 
B. D. & 28 34 26 so. 
O. A. 37 32 «648 04 13 «#00 
Coding 14 24 30 17. 04 04 #3 40 40°" 
Mazes 14 24 24 31 19 04 34 3847 
Mean 30 40 36 19 17 18 19 18 19 33 25 26 
Battery- 
Specificity 

Verbal 86 89 89 03 #05 04 11 06 
Perf. IQ 61 68 65 12 10 410 

08 04 07 


F.S.10 85 88 91 


Note.—-Decimal points omitted. 
* Estimated. 


(See Wechsler, 1949, pp. 13-14.) 


talent,” results in low test intercorrelations) 
rather than to age as such. When unselected 
children are compared with unselected adults 
in generality of intellectual functioning on 
representative tests, the differentiation hy- 
pothesis is rejected; in fact, we find greater 
generality of intellectual functioning in adults 
than in children. 


A Rationale for the Subtests 


In previous articles, rationales for Wechsler 
subtests were presented based on factor analy- 
ses of the W-B in neuropsychiatric patients 
(Cohen, 1952a) and of the WAIS standardi- 
zation population (Cohen, 1957a). In order 
to adduce a similar rationale for these subtests 
on the WISC, for each subtest in each age 
group the proportion of variance attributable 
to the following sources was determined (see 
Table 4): 
1. General: Variance due to G is on the whole 


larger than the other sources; this simply underscores 
the importance of the general factor in the WISC. It 


is of importance that the reader note that the table 
entries are proportions of variance, not correlations; 
the latter are higher—the square root of the tabled 
value—and have been given in Table 2. 

2. Primary-Specific: As used here, this concept is 
meant to identify the proportion of a subtest’s vari- 
ance which is shared with at least one other subtest, 
over and beyond G. In previous analyses (Cohen: 
1952a, 1957a), this has been identified as primary- 
specific variance, ie., that part of each primary fac- 
tor’s variance specific to the factor and independent 
of G. The tabled entries here combine the variance 
of all primary-specifics, although for most of the 
subtests one factor predominates. The nature of the 
primary-specific variance of each subtest was deter- 
mined in the basic analysis (Table 1). 

3. Subtest-Specific: This represents what each sub- 
test measures exclusively (and validly) in the scale. 

4. Error: This is the proportion of variance ac- 
counted for by random error of measurement, i.e., 
unreliability. These values are, on the average, four- 
fifths as large as the G proportions, and suggest the 
wisdom of caution in the interpretation of single sub- 


test scores. 


The rationale which follows is for the most 
part an interpretation of Table 4, to which 
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the reader should freely refer without explicit 
reference. 


Information 


This subtest occupies an uncontested second 
place, after Vocabulary, as a measure of G 
among the WISC subtests, particularly for 
older children, where G accounts for three- 
fifths of its variance. In the 7}-year-olds, it 
does not perform as well in measuring G, but 
then at this age level, because of the relatively 
high unreliability (an average over the sub- 
tests of a third of the variance), most of the 
subtests show less G variance than at later 
ages. 

Information is the only subtest which loads 
Factor A consistently and solely; it was the 
basis of the hypothetized distinction between 
Factors A and D. Thus, all its primary-spe- 
cific variance can be attributed to verbal 
knowledge. Unfortunately, this is not nruch: 
.23 at age 7-6 and .13—.15 at the other ages; 
each of these amounts is less than the error 
variance at each age. For a younger subject, 
one might hazard an interpretation of the In- 
formation score as specific verbal knowledge 
only when it is quite deviant (at least 3 
points) from the general subtest average. 
Even for a young subject, given the amount 
of error variance, this is done at considerable 
risk; for older children, where the primary- 
specific proportion is even smaller, the risk 
should not be taken. 

As is usually the case for the WISC sub- 
tests, no subtest-specific interpretation (e.g., 
“this boy is unusually well informed about 
general facts for his ability level”) should be 
made, given the .05—.06 of the Information 
variance so occupied. 

As will be found to be the case in several 
subtests, the major utility of Information is 
as a measure of G and the Verbal Compre- 
hension Factor (see below, Factor Scores) in 
combination with other subtests. 


Comprehension 


This is a moderately good measure of G 
compared with all the other subtests, but un- 
distinguished when compared to the other 
“essentially verbal’ subtests (Information, 
Similarities, and Vocabulary). This is un- 
doubtedly due to the fact it has the highest 
error variance in this group. 
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In its primary-specific variance, it is of in- 
terest in that, as age increases, it goes from 
high to low loadings on Factor D and from 
zero loadings at 7-6 and 10-6 to a substan- 
tial loading at 13-6 on Factor A (Table 1). 
This subtest was central in the tentative in- 
terpretation of Factor D as stressing verbal 
judgment. Due to the fact, however, that its 
error variance is twice the size of its primary- 
specific variance, and the further fact that the 
latter coutains mixed A and D variance at 
13-6, it is not recommended that in clinical 
practice Comprehension scores be interpreted 
as measuring verbal judgment (or knowl- 
edge). The low subtest specificity also pre- 
cludes any interpretation along those lines. 
The subtest’s major utility lies in its com- 
bination with others as a measure of G and 
of Verbal Comprehension. 


Arithmetic 


Arithmetic is quite similar to Comprehen- 
sion as a measure of G, better than any of the 
performance tests, but not as good as Vo- 
cabulary or Information. Apart from G, its 
primary-specific variance is small (.11—.12) 
and varies with age: at 7-6 and 10-6 it is 
Factor A variance, while at 13—6 it takes on 
the measurement function which it maintains 
until old age, Factor C, Freedom from Dis- 
tractibility. Under these circumstances, and 
with an average of a fourth of its variance in 
error, interpretation of a subject’s score on 
Arithmetic in terms of either of these factors 
(depending on age) is inadvisable. (Note be- 
low, however, its use in combination with 
Digit Span.) 

For children at 10-6 and 13-6, subtest- 
specific variance on this test compares rela- 
tively favorably with (i.e., is a little greater 
than) error variance. Thus, one can draw in- 
ferences of specific Arithmetic ability or dis- 
ability, but only when there is a substantial 
departure of the subtest score from the mean 
of the other (particularly verbal) subtests. 
Using this mean as a reference point has the 
effect of partialling out G, which accounts for 
the largest amount of the subtest’s variance. 


Similarities 
This subtest is similar to Comprehension in 


that it is a moderately good measure of G 
(surpassed only by Vocabulary and Informa- 
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tion), with .44 of its variance, on the aver- 
age, so occupied. 

At 7-6 and 10-6, its primary-specific vari- 
ance comes from Factor A, while at 13-6 it is 
made up about equally of A and D. Again the 
amount is small, both absolutely and relative 
to the amount of error variance, and clinical 
interpretation along these lines is precluded. 
The same consideration applies to its subtest- 
specific variance. 

The major utility of Similarities, therefore, 
lies in its combination with other subtests in 
the formation of factor scores. 


Vocabulary 


This subtest stands out as it has in the 
other analyses (Cohen: 1952a, 1952b, 1957a, 
1957b) as a measure of G. On the average, 
G occupies .61 of its variance, and this aver- 
age is depressed by the youngest group, where 
subtest reliabilities are consistently low. The 
G proportions at 10-6 and 13-6, respectively 
.74 and .62, correspond to correlations with 
G of .86 and .79. These values are quite high, 
and justify the frequent practice of using this 
subtest by itself as a basis of estimating in- 
telligence in research and clinical screening 
batteries. 

Because of the fact that G is so strongly 
verbal in character, a good measure of G such 
as Vocabulary has little primary-specific vari- 
ance, and not much more subtest-specific vari- 
ance. The source of the primary-specific vari- 
ance is mixed as a function of developmental 
changes, going from primarily Factor D vari- 
ance to primarily Factor A variance (where it 
remains in adulthood). Therefore, despite the 
relatively (and at 10-6 and 13-6 absolutely) 
low error variance, specific interpretations of 
scores are not advised. Vocabulary is essen- 
tially a G measure, used either by itself or in 
combination with other subtests. It can also 
be used, as will be seen below, in the construc- 
tion of a factor score to measure Verbal Com- 
prehension. 


Digit Span 

The most salient characteristic of this sub- 
test is its poor reliability, the lowest, on the 
average, of the subtests. With from .40—.50 of 


its variance made up of measurement error, 
it follows that it cannot be depended upon 
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by itself as a measure of G nor of Freedom 
from Distractibility (although it measures it 
purely). Although it has utility in combina- 
tion with Arithmetic as a measure of Factor 
C on the WAIS in adults (Cohen: 1957a, 
1957b) and in older children (see below), at 
ages 7-6—10—6, the Arithmetic test carries 
Factor A variance and Digit Span cannot be 
so combined. 


Picture Completion 


This test is one of the four outstandingly 
poor measures of G on the scale (the others 
being Object Assembly, Coding, and Mazes). 
It averages only .19 (r = .43) of its variance 
so occupied. 

The primary-specific variance of Picture 
Completion is of interest from a develop- 
mental point of view. During the entire adult 
age range, it is the only subtest found to con- 
sistently load Factor D, which led to the con- 
sideration of this factor as a Picture Com- 
pletion specific in the WAIS study (Cohen: 
1957a, 1957b). In the present analysis, al- 
though Picture Completion again loads Fac- 
tor D consistently, verbal subtests also load 
this factor, and Picture Completion has im- 
portant loadings at 10-6 and 13-6 on Factor 
B (Table 1). Thus, the relatively large pri- 
mary-specific variance at ages 10-6 and 13-6 
(Table 4) cannot be attributed to any single 
source. Further, this subtest’s error variance is 
the largest among the subtests, which by itself 
precludes specific interpretations. 


Picture Arrangement 


This subtest has the distinction of providing 
the best single measure of G among the per- 
formance scale tests, with an average propor- 
tion of G variance of .36 (r = .60). It is par- 
ticularly good in this regard at ages 7—6 and 
10-6. Although this level of G variance is ex- 
ceeded by all the verbal scale subtests except 
Digit Span, in the testing of children with 
language or educational handicaps where ver- 
bal subtests are invalid, this subtest can be 
used, particularly in combination with Block 
Design, to get a nonverbal estimate of G. 

High error variance and mixed sources of 
primary-specific variance preclude interpreta- 
tion of Picture Arrangement scores along spe- 
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cific lines. Its utility lies in its measurement 
of G. 


Block Design 


This is a very useful subtest in its measure- 
ment both of G and Perceptual Organization. 
This is made possible by its low error vari- 
ance, ranking with Vocabulary as the lowest 
on the scale when averaged over the three age 
groups. 

Although the proportion of G variance in 
Block Design does not average quite as high 
as that of Picture Arrangement, the former is 
superior at age 13-6. Thus, at older ages in 
particular, Block Design may be a useful non- 
verbal measure of G, especially in combina- 
tion with Picture Arrangement. 

This is the first subtest thus far encoun- 
tered which is found to have utility in the 
measurement of a primary-specific factor. 
Block Design loads only Factor B, and loads 
it quite heavily in all age groups. Thus, the 
proportions of variance attributed to primary- 
specific sources given in Table 4 (.28, .34, 
.26) are identifiable as almost completely due 
to a specific ability in speeded perceptual or- 
ganization (Factor B) over and above G. 

Since Factor B has the lowest correlation 
with G, after G is allowed for there remains a 
considerable amount of B_ primary-specific 
variance in Block Design. There is also about 
an equally substantial amount of subtest-spe- 
cific variance which is likely to be largely due 
to spatial visualization ability. 

Thus, Block Design is a good measure of 
G for a nonverbal test and a good measure 
of perceptual organization ability. This latter 
quality is shared with Object Assembly. 


Object Assembly 


This test has been consistenly found to be 
a poor measure of G (Cohen: 1952a, 1952b, 
1957a, 1957b) and the present findings for 
children agree. It not only has a small pro- 
portion of G variance (.18-.23), but has a 
high proportion of error variance in the three 
groups, .37, .37, and .29. Object Assembly is 
therefore a very poor choice as a measure 
of G. 

Very little of its variance is specific to it. 
Thus, the remaining proportion of the vari- 
ance (.37, .32, .48) constitutes the largest 
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amount of primary-specific variance among 
the subtests, and since its primary factor load- 
ings are high and fall exclusively on Factor B 
(Table 1), all the primary-specific variance 
can be attributed to perceptual organization 
ability. Its already noted poor reliability ham- 
pers its use as a measure of Factor B by it- 
self, but in combination with Block Design 
(and in younger children, Mazes), a poten- 
tially useful measure of Factor B results (see 
below). 

Coding 

Except for a minor change of content in 
Coding A in younger children (Wechsler, 
1949, pp. 85-86), this is equivalent to the 
Digit Symbol test on the WAIS and gives re- 
sults identical with it—it loads exclusively 
and consistently on Factor E, and since it is 
the only subtest which does so, Factor E re- 
mains uninterpreted. 

Error variance is estimated (Wechsler, 
1949, pp. 13-14) as being so high as to rank 
this test among the two or three poorest in 
this regard. Taken together with its relatively 
low G and primary-specific variance, the re- 
sult is a subtest the bulk of whose reliable 
variance lies in its specificity (whose nature 
is of necessity unknown) which is in turn 
smaller than its estimated error variance. 
Taken by themselves, then, Coding scores are 
of limited utility. 


Mazes 


This is the only subtest on the WISC which 
has no counterpart on the W-B or WAIS. Al- 
though the proportion of its variance attribut- 
able to G in the three groups is relatively low 
(.14, .24, .24), it is nevertheless a useful sub- 
test. To begin with, it is not crippled by high 
error variance as are most of the other per- 
formance subtests. This gives it “room” in 
which to measure usefully. Its primary-spe- 
cific variance at the younger ages is substan- 
tial (.31 and .19) and is primarily occupied 
in the measurement of Perceptual Organiza- 
tion. In these younger years, its combination 
with Block Design and Object Assembly 
makes for a useful measure of Factor B. At 
age 13—6, however, only .04 of its variance is 
primary-specific, and this insignificant amount 
is not from Factor B, but from Factor C. 
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More by far than any other subtest (with 
from a third to almost a half of its variance), 
Mazes measures something quite specific to it. 
What this is cannot be determined from the 
present study by the very nature of subtest 
specificity. However, Porteus (1950), who has 
long been associated with this item-type, be- 
lieves the subtest to measure “planning abil- 
ity.” A consideration of this subtest suggests 
that it is reasonable to. identify most of Mazes’ 
specificity with this function. This would then 
mean that if a child’s Mazes score departed 
by several points from his other subtests 
(from Block Design and Object Assembly in 
particular), it is reasonable to interpret it as 
a consequence of planning ability, as such. 


WISC IQs 


Utilizing an extension of the formula for the 
correlation of sums given by Tryon (1958, p. 
27), the correlations with G of the Verbal IQ 
(6 subtests), the Performance IQ (6 sub- 
tests), and the Full Scale IQ (12 subtests) 
were determined and squared, giving the pro- 
portion of variance attributable to G. The re- 
liabilities of these composite IQs were found 
as a function of the subtest reliabilities by 
means of Tryon’s formula for battery reli- 
ability (Tryon, 1957, Eqs. 10 and 29), and 
from these the proportion of error variance. 
These values are presented for the three age 
groups in Table 4, and, in addition, the 
residual proportions are given under the head- 
ing “Battery-Specificity,” since they represent 
the proportion of valid non-G variance in each 
composite. 

Both the Verbal and Full Scale IQs are ex- 
cellent measures of G. About .85—.90 of their 
variance is so occupied, which, expressed as 
validity coefficients, amounts to a range of 
values of .92-.95. Either of these IQs repre- 
sents well the G defined by the 12 subtests. 
Performance IQs, on the other hand, contain 
only .61—.68 of G variance, which, although 
translated into rather impressive validity co- 
efficients of .78-.82, are relatively poor com- 
pared to the other two IQs. 

The fact that the intellectual functioning 
represented by G is considerably dependent 
on verbality is indicated by the fact that the 
Full Scale and Verbal IQs are about equally 
valid as measures of G. Apparently, when the 
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performance scale subtests are added to the 
verbal scale subtests to make up the Full 
Scale IQ, whatever G variance the former 
bring is counterbalanced perfectly by the non- 
G (specificity and error), resulting in no ad- 
ditional validity for G measurement beyond 
that supplied by the verbal scale. 

Beyond G, the very little variance which is 
left in the Full Scale and Verbal IQs is about 
equally accounted for by error and primary- 
or subtest-specificity, the latter being obvi- 
ously quite negligible for test interpretation 
purposes. 

In the case of the Performance IQ, although 
the error variance is slightly larger, there re- 
mains about a quarter of the total variance 
specific to these tests, which is predominantly 
reflective of primary-specific Factor B, that 
part of Perceptual Organization which does 
not enter into G. In clinical use, the compari- 
son of Verbal with Performance IQ for a sub- 
test is seen to be justified, in that the differ- 
ence between the two reflects the battery 
specificity of the Performance Scale, the ef- 
fect of G having been in effect “partialled out” 
by subtraction. The modification of clinical 
practice suggested by the present results lies 
in the conception of the Verbal IQ as a'meas- 
ure of G, rather than of a “group” or primary 
factor of verbal ability. Another way in which 
this can be put is that insofar as the WISC 
is concerned, no distinction in measurement 
function can be made between the Verbal IQ 
and the Full Scale IQ for the general popula- 
tion of children. 


Factor Scores 


In the article presenting a factor-analytic 
rationale for the WAIS (Cohen, 1957a), fac- 
tor scores for the measurement of the primary 
factors were advocated. The task of formulat- 
ing such scores for the WAIS was simpler 
than proved to be the case for the WISC be- 
cause of the greater consistency of subtests in 
their factor measurement which obtained for 
adults (Cohen, 1957b) than is found to be 
the case in children. 

At the very beginning of this attempt it is 
noted that only Information loads Factor A 
exclusively and consistently, and that while 
Comprehension and Picture Completion both 
load Factor D consistently, neither does so 
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exclusively (Table 1). Since to obtain rela- 
tively unambiguous factor scores at least two 
exclusively and substantially loading subtests 
are desirable, it is not possible to formulate 
separate A and D factor scores. 

However, it has been pointed out above that 
the distinction between Factors A and D, both 
called Verbal Comprehension factors, is not 
sharp. In the light of the high correlation gen- 
erally obtaining between them, and since the 
result was perfect consistency with the WAIS, 
it was decided to pool these two factors into 
a single Verbal Comprehension factor. When 
this is done, it is found that the four tests 
which loaded either factor both consistently 
and exclusively were the same as were com- 
posited into a Verbal Comprehension factor 
score on the WAIS, i.e., Information, Compre- 
hension, Similarities, and Vocabulary (Cohen, 
1957a, p. 456). 

A factor score for Verbal Comprehension 
(VC) is found by averaging the WISC 


weighted scores for the above four subtests. 
For a given child, Verbal Comprehension abil- 
ity is high or low relative to children his age 
as the VC score is above or below 10, the 
population mean, and high or low ipsatively 


(i.e., within the pattern of the child’s abili- 
ties), as it departs from the child’s other fac- 
tor scores. The other factor scores are simi- 
larly interpreted. 

Perceptual Organization factor scores (PO) 
must be differently made up at different ages. 


Block Design, Object Assembly, and Mazes; 
at 13-6 and above, Mazes must be dropped 
from the composite because it ceases to load 
Factor B (Table 1), and the resulting PO 
score is identically composed as it is for the 
WAIS (Cohen, 1957a, p. 457). 

Freedom from Distractibility scores (FD) 
are not available at 7—6 to 10—6, since at these 
ages, no subtest can be found to supplement 
Digit Span, which alone loads Factor C ex- 
clusively (Table 1). At age 13-6, Arithmetic 
can be averaged with Digit Span for an FD 
score, resulting in the same Factor C score as 
formulated for the WAIS (Cohen, 1957a, p. 
457). 

Finally, since all subtests are correlated 
with G, a G factor score is obtained as simply 
the average of all the subtests. This score will, 
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like the primary factor scores above, have a 
population mean value of 10, and be com- 
parable with them for purposes of ipsative 
measurement. If only a measure of G is re- 
quired, the Full Scale 1Q (or the Verbal IQ) 
serves this purpose. If it is not necessary to 
determine the conventional IQs, a short form 
of the WISC can be administered which yields 
all the recommended factor scores. The sub- 
tests necessary for this purpose are: 


Ages 7-6 and 10-6: Information, Comprehension, 
Sin... rities, Vocabulary (VC); Block Design, Object 
Assembly, Mazes (PO); the preceding 7 subtests’ 
mean gives the short-form G scote 

Age 13-6: Information, Comprehension, Similari- 
ties, Vocabulary (VC); Block Design, Object As- 
sembly (PO); Digit Span, Arithmetic (FD); the 
preceding 8 subtests’ mean gives the short-form G 
score. Note that at this age, the factor scores are 
identically composed as those for adults on the WAIS 
(Cohen, 1957a). 


The significance of these factor scores can 
only be determined by research. However, the 
logical implications of the analysis are such as 
to suggest that these scores, since they follow 
definable functional unities in children, should 
be of greater use than either the relatively un- 
reliable and ambiguous single subtest scores 
on the one hand, or the more or less a priori 
Verbal and Performance IQs on the other 
(Cohen, 1957a, p. 456). 

In order to facilitate the use of these factor 
scores, it is desirable that in addition to the 


_ population mean (which is 10 throughout), 
At ages 7-6 to 10-6, it is the average of 


the variability of each factor score in the 
population be known. Applying to the stand- 
ardization data the formula for the standard 
deviation of a mean (Guilford, 1950, p. 456), 
the values of the standard deviations of each 
of the above factor scores at each age group 
were determined and are presented in Table 5. 
As an approximation in clinical work, the 
standard deviations of the primary factor 
scores can be taken as 24, and of G factor 
scores (based on 12 subtests) as 2 (in con- 
trast to subtest scores, where the SD is 3). 
These factor scores are normally distributed, 
so that, for example, a primary factor score 
of 15 can be taken as falling approximately 
two standard deviations above the mean and 
therefore in the top 3% of the population. 
For ipsative (i.e., pattern) analysis, the 
factor scores of a subject can be compared 
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Table 5 


Standard Deviations of WISC Factor Scores at 


Age 
Factor Score 7-6 10-6 13-6 
vc 2.3 2.6 2.6 
PO 2.4 2.4 
FD 2.5 
G (12 subtests) 1.9 2.0 2.0 
G (short form) 1.9 2.2 2.1 


among themselves much as subtest scores are. 
The effect of such comparison is to “partial 
out” the all-pervading influence of G, result- 
ing in differences which can be attribute] to 
what is specific to the primary factors in- 
volved. 


Summary and Conclusions 


The WISC standardization data for the age 
groups 7-6, 10-6, and were factor-ana- 
lyzed by group, using complete centroid ex- 
traction, oblique rotation to a criterion of sim- 
ple structure and a positive manifold, and a 
second-order general factor analysis. The pro- 
portions of the variance attributable to the 
general factor, communality, specificity, and 
error were compared both among the chil- 
dren’s groups and with adults’ WAIS per- 
formance (Cohen, 1957b). The following con- 
clusions were drawn: 

1. Five correlated factors were found con- 
sistently in the three children’s age groups: 
Verbal Comprehension I and II, Perceptual 
Organization, Freedom from Distractibility, 
and a quasi-specific factor. These are essen- 
tially the same factors which were found for 
adults on the WAIS. 

2. A second-order general factor, G, ac- 
counted for about one-third of the total vari- 
ance and about one-half of the true variance 
of the WISC. This factor has a very similar 
loading pattern to its adult counterpart, being 
measured best by the essentially verbal sub- 
tests. 

3. Subtest specificity is relatively small, 
which renders invalid the clinical rationales 
which are dependent on distinctive measure- 
ment functions of the 12 subtests. This again 
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duplicates the findings on the WAIS (Cohen, 
1957b). 

4. It was found that children exhibit a sub- 
stantially smaller degree of generality of in- 
tellectual functioning than do adults. This is 
directly counter to the widely held belief that 
intelligence in children is highly general and 
differentiates progressively as they grow to 
maturity. 

5. Finally, from an analysis of the sources 
of score variance, each subtest’s measurement 
function was discussed in terms of G, the pri- 
mary factors, specificity and measurement 
error. It was found, with some exceptions, that 
single subtest scores do not lend themselves to 
individual interpretation. Similar analysis of 
the IQs revealed the fact that both the Verbal 
and Full Scale IQs are excellent measures of 
G. Some factor scores were proposed, and the 
manner of their utilization was described. 
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“To measure the effects of any condition 
on efficiency, whether it be disorder, cere- 
bral injury, or age, should require on logical 
grounds that an individual’s efficiency be 
measured before and after the onset of the 
condition. In practice this nice logical re- 


quirement has not been feasible. Only very — 


recently has any considerable proportion of 
the population been given the standard tests 
of intelligence, and no investigator, to our 
knowledge, has been able to obtain predis- 
order or preinjury test results from the pa- 
tients in his sample. For this reason substi- 
tute controls have been devised” (Hunt & 
Cofer, 1944, p. 972). 

Direct measures of premorbid intelligence 
are now available on all patients in the U. S. 
Armed Forces. The present study uses these 
direct measures to answer the following ques- 
tions: 

a. What is the amount of cognitive deficit 
produced by brain injury? 

5. Does brain injury cause differential defi- 
cit in intellectual skills? 

c. What is the best way of using the pa- 
tient’s premorbid and current level of intelli- 
gence to diagnose brain injury? 

In the past, indirect measures of premorbid 
intelligence have been used to answer these 
questions. In dealing with individual patients, 
clinical psychologists have typically assumed 
that certain intelligence tests, such as vocabu- 
lary, were resistant to brain injury, and have 
used these scores to estimate premorbid in- 
telligence. In dealing with groups, population 
norms have been used. 


1 Future papers will deal with the direct measure- 
ment of cognitive deficit in functional psychoses and 
neuroses. 


Measures 


All male enlistees and inductees who come 
under draft quotas must take the Armed 
Forces Qualification Test (AFQT). The AFQT 
is a spiral omnibus mental test which yields 
a single score. 

Since 1948, all enlisted personnel have taken 
the Army Classification Battery (ACB) upon 
entry into the Army. The ACB yields 10 sepa- 
rate scores, which, in various combinations, 
predict achievement at different Army schools 
with validities of about .60. Each test has a 
mean of 100 and a standard deviation of 20. 

By readministering either of these tests 
during or after hospitalization, direct meas- 
ures of cognitive deficit can be obtained 
(Montague, Williams, Lubin, & Gieseking, 
1957). In the present study the ACB was 
used, rather than the AFQT, so that differ- 
ential deficit could be studied. 

The first five tests of the ACB were ad- 
ministered to every S: Reading and Vocabu- 
lary (RV), Arithmetical Reasoning (AR), 
Pattern Analysis (PA), Mechanical Aptitude 
(MA) and Clerical Speed (ACS). These five 
tests require about two-and-a-half hours. 
Often it was necessary to have two separate 
testing periods, particularly with brain-in- 
jured Ss. Alternate forms of the ACB were 
used for the retests. The 10 ACB tests have 
previously been described in detail (Mon- 
tague et al., 1957).? 


Subjects 


All Ss were enlisted males in the U. S. 
Army. The brain-injured group consisted of 


2 This study used the 1948 form of the ACB. Since 
June 1957, a revised battery containing 11 tests has 
been in use. 


300 


‘ 
\ 
= 
> 
A 
Paves 
¢ 


Direct Measurement of Cognitive Deficit 


Table 1 


Brain-Injury Diagnoses 


Diagnoses 


Traumatic head injuries 
Posttraumatic encephalopathy 
Subdural hematoma 
Motor seizures 
Others 

Cerebral vascular accident 

Neoplasm 

Miscellaneous 


64 patients referred by the Neurology and 
Neurosurgery Services at Walter Reed Army 
Hospital. Brain injury was defined as intrinsic 
damage above the tentorium cerebelli. Table 1 
gives a summary of the diagnoses. Most pa- 
tients had diffuse rather than focal brain dam- 
age, and the amount of brain tissue involved 
varied considerably. The age range was from 
18 to 50, with a mean of 27 and a variance 
of 71. 

All of the 64 brain-injured patients had re- 
covered to a point where they could be tested, 
approximately two months after hospitaliza- 
tion. Many brain-injured patients never re- 
covered sufficiently to be tested, and these, 
of course, could not be included in the study. 

Several control groups were used: 


a. Patient controls; 47 non-brain-injured, and non- 
psychiatric patients from Walter Reed Army Hos- 
pital, and 
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b. Duty controls; 55 enlisted men on duty in a 
field hospital at Fort Meade, Maryland, and 60 en- 
listed men on duty in the Walter Reed Troop Com- 
mand. 


A detailed questionnaire was used to de- 
tect any control with a possible head injury. 
Such Ss were not included in the study. The 
age range of the control Ss was from 18 to 51, 
with a mean of 27 and a variance of 48. 


Results 


Table 2 presents the answer to Question a: 
What is the amount of deficit produced by 
brain injury? The brain-injured Ss decreased 
about 67% of a standard deviation, whereas 
the controls increased slightly, about 15% of 
a standard deviation. There were no signifi- 
cant differences between the hospital and 
duty controls. For each test in the brain-in- 
jured group, Student’s ¢ showed that the de- 
crease was significant at the .01 level. 

The second question is a familiar one: Is 
there evidence of differential deficit? That is, 
are any of the five ACB tests relatively re- 
sistant to brain injury? In partial answer to 
this question, Table 2 shows that, contrary 
to the generalizations made by many previ- 
ous investigators (e.g., Babcock, 1930; Hunt, 
1943; Shipley, 1940; Wechsler, 1944), the 
deficit on Reading and Vocabulary is at least 
as great as for any of the five tests. 

For a more complete answer to the ques- 
tion of differential deficit, we can test the 
null hypothesis that the average decrease in 
the brain-injured group on all five tests is 


Table 2 


Difference between ACB Premorbid and Retest Scores for Controls and Brain-Injured Patients 


Patient 
Controls 
(N=47) 


Injured 
(N =64) 


1 
1 
4 
3 
3 


13.8 


Note.—D is the sum of the differences for all five ACB scores. 


Duty 
Controls 
(N=115) 


Variances 
Duty 
Controls 


Patient 
Controls 


Brain 
Injured 
370.6 
362.0 
419.0 


392.4 
373.3 


235.3 
200.5 
328.6 
187.5 
213.7 


162.0 
139.3 
302.0 
144.4 
143.8 


5,141.3 


1,869.8 1,386.7 


| 
43 
15 
3 
3 
22 
11 
3 
7 
— 
Means 
Test 
om RV ~16.5 4 28 
AR —15.1 1.4 
4 PA —9.2 3 4.0 
MA —9.8 2 5.2 
ACS —16.0 4 4.6 
| 
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equal. Two approximate statistical techniques 
were used to test this hypothesis; both gave 
the same result: a significant, but very small 
tendency for PA and MA to resist the effects 
of brain injury. 

The use of approximate statistical tech- 
niques is forced because there is no simple 
exact test for the differences between three 
or more correlated means.* Therefore, Box’s 
method and Kendall’s W were used. 

a. Box’s (1954) test for differences be- 
tween correlated means. The usual two-way 
analysis-of-variance was calculated for the 
subject-by-difference score matrix, within the 
brain-injured group. Box’s test calls for a 
correction to the degrees of freedom which 
makes an approximate allowance for the effect 
of intercorrelated scores as well as hetere- 
geneous variances. The F ratio for the be- 
tween-tests main effect was just significant at 
the .05 level, when Box’s correction was ap- 
plied. 

One difficulty of the Box test is that it as- 
sumes multivariate normality of the five dif- 
ference scores whereas, on inspection, it ap- 
pears that in the brain-injured group, the 
scores are skewed negatively. 

b. Kendall’s W test (1948). Therefore, 
Kendall’s rank-order test was applied to see 
if the tendency for PA and MA to be re- 
sistant to brain injury was relatively consist- 
ent for all 64 brain-injured Ss. Kendall’s W 
is a close approximation to an average Spear- 
man rank-order correlation. (The significance 
test assumes that the score distribution is the 
same for each of the five tests, but the dis- 
tribution does not have to be normal.) W 
equaled .06, which is, of course, very small 
but significant at the .01 level. 

What is the best way of using the pre- 
morbid and retest scores to differentiate brain- 
injured and controls? For groups with multi- 
variate normal distributions and equal co- 
variance matrices, there are only a few 
possible ways to do this: (a) the simple dif- 
ference, T. — T; = D, where Ty is the sum 
of five retest scores and T, is the sum of the 
five premorbid scores; (4) the residual change 


Since submission of this manuscript, we have 
learned that Rao (1952, pp. 239-243) has an exact 
test appropriate to the case where the distribution is 
multivariate normal. 
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score, Ts, = T2-a-bT;, which corrects for the 
improvement shown by controls on retest 
(Montague et al., 1957); or (c) the Fisher 
(1941, p. 279) linear discriminating function, 
the multiple regression of all 10 scores (five 
premorbid tests and five retests) on the di- 
chotomous criterion, brain-injured vs. controls. 

An empirical comparison of these three 
methods showed that the cumbersome multi- 
ple regression formula is not necessary here 
for optimal discrimination. The three meth- 
ods are intimately related to each other in 
the following way: the validity of the linear 
discriminating function must be equal to or 
greater than the validity of the residual 
change score, which in turn must equal or 
exceed the validity of the difference score. 
The validity of the residual change score will 
equal that of the multiple regression if (a) 
controls and brain-injured differ on Ts, but 
not on T;, (0) the correlation between T, 
and T» is substantial, (c) the multiple re- 
gression weights for all five premorbid tests 
are near zero, and (d) the multiple regression 
weights for all five retests are positive and 
equal. In other words, if T, is a suppressor 
variable, then the residual change score will 
yield optimal discrimination (Lubin, 1957). 

For the computation of the Fisher linear 
discriminator, a special subgroup of 64 men 
was drawn from the sample of 162 controls so 
that each brain-injured patient was matched 
with a control within 20 points of his pre- 
morbid score, T,;. Table 3 shows that match- 
ing on T,; tended to match all of the five 
premorbid scores. In general, using equal 
numbers and matching will maximize the dif- 
ference between two groups. When the con- 
trols were given a dummy score of unity and 
the brain-injured a dummy score of zero, the 
multiple correlation of the linear discriminator 
with this dichotomous criterion was .66. 

To obtain the residual change score, a and 
b were calculated, using all 162 control Ss: 
b = and a = T, — bT;. Using the 
obtained values, 6 = .91 and a = 58.05, the 
validity of the residual change score was 
found to be .61, not significantly different 
from the multiple correlation. 

A further simplification is possible. If rj. is 
sufficiently high and s. is approximately equal 
to s;, then the difference score, D, will have 
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Table 3 


Descriptive Statistics on Premorbid and Retest Scores 


Brain Injured Controls 
(N=64) 


(N =64) (N =64) 
98.4 
92.8 
94.7 
94.0 
84.1 


99.7 
94.3 
92.1 
95.4 
85.9 


464.0 467.4 

optimal validity. The correlation of D with 
the dichotomous criterion was .60, almost 
equal to the validity of the residual change 
score, but significantly lower than the multi- 
ple correlation. 

It is clear that either D or Ts, could be 
used for discrimination with no practical loss 
in validity. We have chosen to use the residual 
change score because it has a theoretical ad- 
vantage as the ratio of controls to brain-in- 
jured increases. 

How well does the residual change score 
discriminate the brain-injured from the con- 
trols? Unfortunately, there is no single an- 
swer to this question. If a residual change 
score of zero is used as a cutoff point, then 
50% of the controls will be correctly classi- 
fied, and 92% of the brain-injured will be 
correctly identified, the over-all percentage of 
correct classification for the total sample of 
226 being 64. If —54 is used as a cutoff point, 
the over-all percentage of correct classification 
is 85 (the best we have done in this sample) ; 
for the controls, it is 94% correct classifica- 
tion; for the brain-injured, it is 62% correct 
classification. 

One of the greatest difficulties in deciding 
which cutoff point to use is that, as previ- 
ous writers have emphasized (e.g., Meehl & 
Rosen, 1955), usually the percentage of cor- 
rect classification will shift if the ratio of 
controls to brain-injured is changed in subse- 
quent samples. However, it is always possible 
to find a cutoff point such that the percentage 
of correct classification is invariant with re- 
spect to changes in base rates. This is the 
point where the percentage of correct classifi- 


Brain Injured Controls 


Brain Injured Plus Controls 
(N = 128) 


(N=64) 


101.6 
96.2 
97.1 
99.5 
90.2 


302.8 
341.8 
410.1 
256.3 
334.9 


472.4 
393.4 
407.1 
337.1 
428.4 
484.6 4,541.5 7,026.8 


cation in one group equals the percentage of 
correct classification in the other group. In 
our study, the cutoff point of equality of mis- 
classification is —33, and it yields, as an in- 
variant, 82% correct classification. This seems 
to be better than the percentage of correct 
classification reported in most studies of brain 
injury. 


Discussion 


Only two previous studies seem to have 
used direct measures of the premorbid intel- 
lectual level in estimating the deficit due to 
brain damage. Canter (1951) found that the 
AGCT scores of 23 multiple sclerotics de- 
creased about three-fourths of a standard de- 
viation. Weinstein and Teuber (1957), using 
the AGCT, indicate that for patients with 
focal gunshot wounds, there is little evidence 
of deficit 10 years after injury. 

The Weinstein-Teuber findings could be in- 
terpreted as evidence that paper-and-pencil 
intelligence tests are not very sensitive to the 
effects of brain injury. Instead, we assume 
there was considerable initial deficit, but that 
the ambulatory patients of the Weinstein- 
Teuber sample regained their intellectual skills 
over a period of 10 years. This interpretation 
has the hopeful implication that brain injury 
need not cause permanent deterioration. Some 
preliminary data we have indicate that, under 
skilled medical care, almost half of the deficit 
may be regained within six months after in- 
jury. This is in general agreement with Wep- 
man’s (1951) results on the effect of training 
brain-injured patients. 

To many clinical psychologists, the most 
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surprising result will be that brain injury 
caused as much deficit in the verbal tests as 
in the others. It is commonly believed that 
verbal tests, and in particular vocabulary 
tests, are relatively resistant to brain injury. 
Why should our results differ from those re- 
ported by previous investigators? 

One possible reason is that the RV test, 
which is of the recognition type, measures 
factors other than those tapped by the usual 
recall vocabulary. The majority of the items 
in RV require the S to read and understand 
a paragraph. Only a minority are standard 
vocabulary items. However, Montague et al. 
(1957) found that in a sample of Army re- 
cruits RV had a correlation of .76 with the 
Wechsler-Bellevue Vocabulary and Informa- 
tion subtests, and a correlation of .81 with 
the Wechsler-Bellevue Verbal scale. We have 
since found this same correlation of .76 be- 
tween RV and W-B Vocabulary in two sam- 
ples. (Army enlisted men, V = 45, and brain- 
injured patients, V = 39.) It is unlikely that 
RV contains additional common variance 
above that shared with Information and Vo- 
cabulary, since its correlations with the re- 
maining eight W-B subtests are distinctly 
lower. 

A more important question is whether our 
results do differ markedly from previous find- 
ings. Yates (1956) recently published an ex- 
tensive review of previous studies in which he 
concludes “. . . vocabulary does decline in pa- 
tients suffering from brain-damage”’ (p. 436). 
We are inclined to believe that those cross- 
sectional studies that reported less decline for 
vocabulary contained some bias due to the 
difficulty of matching controls with brain-in- 
jured on premorbid scores. 


Conclusions 


Traumatic brain injury results in a general 
deficit in intelligence test scores with little 
differential deficit. Reading and Vocabulary, 
Arithmetic Reasoning, and Clerical Speed de- 
cline slightly more than the spatial tests: Pat- 
tern Analysis and Mechanical Aptitude. These 
results, together with Yates’ (1956) penetrat- 
ing analysis, should end the myth that verbal 
tests such as Reading and Vocabulary are re- 
sistant to the effects of brain injury. 


The residual change score is an optimal 
discriminator between brain-injured and con- 
trols, no differential weighting of the five 
ACB scores being necessary. 


Summary 


Qualification and classification tests can be 
used as premorbid measures of intelligence 
for all patients who have been screened by 
the U. S. Armed Forces. Retests on five tests 
of the Army Classification Battery furnished 
direct measures of cognitive deficit for 64 
brain-injured patients. The average impair- 
ment was about two-thirds of a standard de- 
viation. The verbal tests (Reading and Vo- 
cabulary, Arithmetical Reasoning, and Cleri- 
cal Speed) were significantly more sensitive 
to brain injury than the spatial tests (Pattern 
Analysis and Mechanical Aptitude) but the 
differences were so small as to be of no prac- 
tical significance. 

A comparison of the brain-injured patients 
with 162 control Ss showed that the simplest 
way of discriminating controls from brain-in- 
jured was to use D = Tz — Tj, the sum of 
the retest scores minus the sum of the pre- 
morbid scores. Theoretically, the optimal dis- 
criminator is the residual change score, T. ; = 
T2-a-bT, (based on control group statistics), 
but both D and Tz, have a point-biserial va- 
lidity of about .60 and a percentage of cor- 
rect classification of about 82. 
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Rorschach (1942) originally observed that 
the human movement (M) response corre- 
lated with intelligence. Since then many in- 
vestigators have confirmed this relationship, 
but the details of the relationship have re- 
ceived scant attention. In pursuing the rela- 
tionship between the inhibition process, in- 
vestigated by means of the Rorschach M 
response, and intelligence, the present authors 
undertook a survey of the reported relation- 
ship between M and various measures of in- 
telligence. The results of this survey are pre- 
sented in Table 1.' 

It will be noted there is a striking uniform- 
ity in the magnitude of correlation between M 
and IQ. Across a variety of groups and with 
differing measures of intelligence, the correla- 
tions are generally low, but significant, and 
with the exception of two studies (Altus & 
Thompson, 1949; 1952) only linear relation- 
ships are reported. The median value of the 
M-IQ correlations is .26.* Second, we noted 


1A number of other investigators; report data 
which confirm the correlation between M and IQ, 
but their findings are not reported in the form of a 
correlation coefficient. Generally they are reported in 
the form of mean or median differences in M in 
groups which differ in intelligence. These studies uni- 
formly confirm the relationship between M and in- 
telligence, as measured in a variety of ways (Barrell, 
1953; Davidson & Klopfer, 1937-38; Diers & Brown, 
1951; Neff & Lidz, 1951; Palmer, 1955; Stainbrook 
& Siegel, 1944; Wittenborn, 1949). 

2 Because so few studies report the appropriate 
data, it is not possible to fully examine the effect 
of partialling out R on the M-IQ relationship. Such 
figures are included for the reader’s edification in 
Table 1, whenever they could be obtained. In most 


THE INHIBITION PROCESS, RORSCHACH HUMAN 
MOVEMENT RESPONSES, AND INTELLIGENCE: 


SOME FURTHER DATA 


MURRAY LEVINE, GEORGE SPIVACK 


Devereux Foundation Institute for Research and Training, Devon, Pennsylvania 


AnD BYRON WIGHT 


Veterans Administration Hospital, Coatesville, Pennsylvania 


306 


that only one study reported the correlation 
in a population of normal children (Hertz, 
1934), and that no work has been reported 
with atypical children. Third, the two studies 
that dealt only with adult schizophrenics 
(Singer, Wilensky, & McCraven, 1956; Taul- 
bee, 1955) both reported correlations ap- 
proaching zero. Only in two other instances 
are such very low correlations reported. In 
both cases the populations represented an ex- 
tremely narrow IQ range (i.e., “ranking re- 
search scientists” [Roe, 1951] and “Yale 
freshmen, ages 17-19” [Vernon, 1933]), 
where attenuation in range would militate 
against significance in correlation. Roe (1951) 
does report a positive correlation between M 
and a measure of verbal intelligence in this 
same group, but the hypothesis that M relates 
primarily to verbal intelligence is not wholly 
tenable. Several studies (Consalvi & Canter, 
1957; Holzberg & Belmont, 1952; Singer 
et al., 1956; Williams & Lawrence, 1953; 
Wishner, 1948) report the same level of sig- 
nificant correlations of M with one or an- 
other nonverbal measure of intelligence. While 
the hypothesis of restriction in range suggests 
itself as an explanation for the low correla- 
tions in some instances, the lack of significant 
findings with schizophrenic groups by two in- 
dependent investigators (Singer et al., 1956; 
Taulbee, 1955) is counter to the trend of re- 
ports in general and no obvious explanation 
suggests itself. 

With these issues in mind, we undertook 


instances, the present authors are responsible for the 


calculations. 
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The Correlations Among M, IQ, and R in a Variety of Populations 


Groups and References 


Normals 
Altus & Thompson 
(1952) 


Altus & Thompson 
(1949) 


Consalvi & Canter 
(1957) 


Hertz (1934) 


Lotsof (1953) 


Roe (1951) 


Vernon (1933)¢ 


Out patients 
Abrams (1955) 


Meltzoff (1956) 


Wishner (1948) 
Hospitalized 
Armitage, Greenberg, 
Pearl, Berger, & 
Daston (1955) 
Beck (1932) 
Holzberg & Belmont 
(1952) 
Singer, Wilensky, & 
McCraven (1956) 
Taulbee (1955)4 


Tucker (1950) 


Williams & Lawrence 
(1953) 


N and Group 


100 Coll. Stu. 


128 Coll. Stu. 
100 Coll. Stu. 


45 Adults—M & F, 
Ages 20-36 


300 J. HS Stu. 


30 Coll. Stu. 


64 “Ranking Research 
Scientists” 


25 “Yale Frosh,” 
Ages 17-19 

48 “Harvard Students,” 

Ages 16-23 


400 Veterans; 
psychiat. clinic pts. 


803 Veterans; 
psychiat. clinic pats. 


20 Neurotics, M & F 
Ages 16-42 


503 Veteran psychiat. pts. 


50 M psychiat. pts. 


60 Schizophrenic veterans 


100 Neurotic veterans 


97 Veterans, psychiat. pts. 


* Group Rorschach; M@ minus Popular M and Popular M produced r’ 


69 “feebleminded,” M & F 
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Table 1 


IQ Test 


Ohio Psychol. Exam 


Ohio Psychol. Exam 


Verbal Aptitude 


Progressive Matrices 
W-B Vocab. 


Not stated 


Ohio St. Psychol. Exam 
Specially devised Verbal 


Spatial 
Math 


Composite score from 
several tests 

Intelligence Test 
Battery 


W-B info. 
W-B sim. 
W-B pict. compl. 


100 Schizophrenic veterans W-B 


Porteus mazes 


W-B 


W-B 


WB Verbal 
Perf. 


M-IQ 


40 
19 


4i 
31 
37 


47 


— .03 


Not sign. 
26 


30 
30 


s of 40 and .19, respectively 
» Group Rorschach; value of .31 is based on a repeated application of the tests to the same population 


M% used 


4dr’s not reported, but were obviously calculated. 


M-R R-IQ M-IQ-R 


41 
Al 


65 
65 
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the present study. First, we were interested in 
seeing whether roughly the same magnitude 
of correlation between M and IQ would hold 
in a group of abnormal adolescents, thus help- 
ing to fill in one gap in the current literature. 
Second, in light of the bulk of evidence for a 
positive M-IQ correlation, we wished to see 
whether we could confirm the negative find- 
ings of Singer et al. (1956) and Taulbee 
(1955) with hospitalized schizophrenics. This 
seemed important for the insight it might 
afford into the M-IQ relationship. Third, we 
were interested in attempting to confirm with 
different populations a previous finding that 
the error of reversing the mirror-image JN, 
symbol for the number two in the W-B 
Form I digit symbol subtest, is related to 
general intelligence and to M. In earlier work 
(Levine, Glass, & Meltzoff, 1957) the reversal 
error has been described as a function of the 
insufficient delay or control of the tendency 
to write the normal N when presented with 
the task of writing its mirror image. This 
error has been shown to be significantly re- 
lated to Rorschach M, a general measure of 
the delay function of the ego. Relating spe- 
cific performances on an intelligence test to 
Rorschach M is one way of spelling out the 
details of the relationship between the per- 
sonality variable of a delay function and gen- 
eral intelligence. 


Subjects 


The study included: 

1. 155 adolescents, both sexes, residents at 
a school for emotionally disturbed and re- 
tarded children, with psychiatric diagnoses in- 
cluding chronic brain syndrome (26%), neu- 
rotic and personality disorders (51%), and a 
variety of schizophrenic and psychotic reac- 
tions of childhood (239%); range, 41-135, 
normally distributed. 

2. 209 adult, male, hospitalized veterans, 
with diagnoses of schizophrenia; IQ range 55— 
145, normally distributed. 

3. 91 adult, male, hospitalized veterans with 
a psychotic diagnosis other than schizophrenia 
(25%), neurotic and personality disorders 
(39%), and chronic brain syndrome (36% ); 
1Q range 55-135, normally distributed. 

4. 132 adult, male, outpatient clinic veter- 
ans with diagnoses of schizophrenia (17° ), 
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neurotic and personality disorders (56%), 
and chronic brain syndrome (27%); IQ 
range 63-138, normally distributed. 


Procedure 


In all cases the values for M, R, and IQ 
were obtained from the clinical files of the 
respective institutions. The scoring of the 
original examiner was accepted in all in- 
stances. The Rorschach and the test of in- 
telligence were administered individually, and 
a record was not accepted for the study if 
more than one week elapsed between the ad- 
ministration of the two tests. In our adult 
populations, and with most of the adolescents, 
the intelligence test used was the Wechsler- 
Bellevue Intelligence Scale, Form I. The re- 
mainder of the IQs were obtained from the 
Wechsler Intelligence Scale for Children 
(WISC). The IQ score used was always the 
full scale IQ. However, in many instances, 
the score represented a prorated value, rather 
than a value based on 10 subtests, since many 
clinicians administered only a short form of 
the test. This is not felt to be any great diffi- 
culty, since the correlation between almost 
any short form and the score derived from 
the full scale is very high. 

The measure of intelligence test perform- 
ance failure was the presence of one or more 
reversals of the mirror-image V, symbol for 
the number two in the W-B, Form I digit 
symbol subtest. The appropriate subtest, or 
the particular form of the W-B, was not ad- 
ministered to all Ss, so data are not available 
on this point for all Ss. However, we have no 
reason to expect any systematic selection of 
Ss for whom the subtest was omitted. 


Results 


The correlations between M and IQ for the 
various populations we studied are presented 
in Table 2. 

Our findings show low but significant cor- 
relations in each of the four populations 
studied. It will be noted that several of the 
correlations are based on populations which 
are heterogeneous for diagnosis. In the adult 
populations, there is no significant difference 
in M production between organics and other 
diagnoses. In the adolescent population, the 
median value of M increases as we move 
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Table 2 
Some Further Correlations Among WM, Wechsler IQ, and R 


Group N M-IQ M-IQ-R 
Hospitalized schizophrenic veterans 209 37 62 43 30 
Hospitalized veterans, nonschizophrenic 91 49 45 31 Al 
Veterans pts, psychiat. clinic 133 Al 60 Al .23 
Adolescents, residential school, all diagnoses, M & F 155 .20 AS .19* .18* 


* Significant at p = <.05 >.01; all other r's are significant at p <.01. 


from the diagnosis of chronic brain syndrome, 
through schizophrenia and then to emotional 
and personality disorders. However within 
each of these groups, when examined sepa- 
rately, those with more M had higher IQs 
than those with fewer M. We therefore do 
not feel that the M-IQ correlation in the 
group of atypical adolescents as a whole 
can be attributed to the heterogeneity of the 
population. In keeping with previous findings, 
there is apparently a more or less consistent 
relationship between M and IQ across almost 
any population. 

It will be noted that a significant correla- 
tion is obtained in the group of adult, hos- 
pitalized schizophrenics. In fact, the figure of 
.37 is well above the median value of .26 of 
all other investigations summarized in Table 1. 

The data depicting the relationship between 
M and reversals of the digit symbol subtest 
mirror-image N are presented in Table 3. 
The data in Table 3 are based only upon 
those Ss for whom evidence of reversal or 
nonreversal of the mirror-image N was avail- 
able. For the adult hospitalized group, those 
who reverse the mirror-image N have signifi- 


All Adult Inpatients* 


Reversers 


Table 3 


Percentage of Reversers and Controls Showing Less Than Two M, Adjusted for R 


cantly fewer M than those who do not make 
this error. The finding continues to hold when 
R is controlled. In the adolescent group, re- 
versers do not produce significantly fewer M 
than do nonreversers, but the direction of the 
slight differences in percentage is consistent 
with current and previous findings with adults. 

Since the prior study (Levine et al., 1957) 
with adult outpatients found a significantly 
higher IQ for nonreversers, a similar compari- 
son was made in our present populations. In 
the present adult population of hospitalized 
patients, the prior observation is confirmed. 
Nonreversers had a mean IQ of 101.3 (0 = 
16.1), whereas reversers had a mean IQ of 
93.5 (o = 17.5). This difference is significant 
well below the .01 level of confidence. In the 
adolescent population, the mean IQ of non- 
reversers was 98.9 (o = 17.4) and the mean 
IQ for the reversers was 91.7 (o = 20.6). This 
difference approached significance at the .15 
level of confidence. 


Discussion 


All of the obtained correlations between M 
and IQ are consistent with results obtained by 


Adolescents 


Controls Reversers Controls 
R N % N % p N % N % p 
Not adjusted 54 83.3 151 58.3 001 30 56.7 57 47.4 
<19 32 90.6 89 708 5 16 68.7 25 64.0 
>20 22 72.7 62 40.3 Ol 14 42.8 32 344 


® This group is drawn from the 209 schizophrenics and 91 nonschizophrenic 


relevant subtest. 


hospitalized patients who were administered the 


309 
4 
be 
ble: 


310 Murray Levine, George Spivack, and Byron Wight 


others on a variety of populations. The addi- 
tion of the group of atypical adolescents adds 
one more link in the chain, and extends our 
ability to generalize about the M-IQ rela- 
tionship. The consistency of correlation sug- 
gests a consistency of process in any popu- 
lation. 

The significant correlation between M and 
IQ in our adult, schizophrenic, hospitalized 
group is in keeping with the general trend of 
past reports in other populations, and it fails 
to support findings reported by Singer et al. 
(1956) and Taulbee (1955). These authors, 
contrary to the general trend of past reports, 
found near zero correlations between M and 
IQ in groups of schizophrenics. We can see 
no obvious explanation for the discrepancy 
between our findings and theirs. In all three 
instances the patient populations were highly 
similar, and the samples sufficiently large. 
We can only suggest that weight of further 
evidence be called upon to settle the question 
as it concerns schizophrenics. 

If the weight of future evidence supports 
our findings with schizophrenics, then the 
reliability of the specific result, and to some 
extent the factor analysis of Singer et al. 
(1956), would become questionable. Along 
this line we might note that Singer et al. did 
not draw an “intellectual factor” from their 
matrix of intercorrelations despite the fact 
that their variables included R, M, C’, IQ, 
and Porteus mazes. Singer’s group did report 
a significant correlation between M and 
Porteus IQ. The fact that they did not draw 
an intellectual factor is surprising, since at 
least three other factor analyses (Consalvi & 
Canter, 1957; Lotsof, 1953; Williams & 
Lawrence, 1953) of Rorschach and _intelli- 
gence test data have produced an intellectual 
factor containing R, M, and C’ among other 
Rorschach variables. We wonder whether the 
same set of four factors found by Singer 
would be drawn out if an intellectual factor 
were produced, or if the specific relationship 
between M and IQ proved to be different in 
a new population of schizophrenics. It may 
eventually prove to be the case, as Taulbee 
(1955) suggests, that we cannot expect the 
same relationships between personality and 
intelligence to hold in schizophrenics. How- 
ever, a great deal more evidence than that 


currently available would have to be mar- 
shalled to demonstrate the point. The failure 
to repeat the zero correlation between M and 
IQ reported by Singer does not in any way 
detract from Singer’s other experimental re- 
sults or theorizing. 

Earlier work (Levine et al., 1957) presented 
the theory that the reversal of the mirror-im- 
age V was a manifestation of poor ability to 
inhibit or delay responses. Our finding with 
the adult hospitalized patients, that reversers 
have less M and lower IQs than nonreversers, 
duplicates prior results closely. It would seem 
that adequate functioning of a delay mecha- 
nism is an important element in earning a 
good score on the intelligence test as a whole. 
Both present findings and prior work indi- 
cate that the relationship between intelligence 
test performance and inhibition processes, as 
measured by M, holds in an adult population 
irrespective of the degree of illness, as indi- 
cated by hospitalized or outpatient status. 

The reversal data in adolescents, however, 
presents a somewhat differing picture. Ado- 
lescents who tend to reverse the mirror-image 
N have a lower mean IQ (as with adults). 
However, the tendency to reverse does not 
seem to be related to Rorschach M (as it is 
related in adults), even though M and IQ 
are significantly correlated. This failure to 
replicate the reversal—M relationship in ado- 
lescents cannot be ascribed to the unreliability 
of the relationship, since the finding proved 
stable with adults. One conclusion that can 
tentatively be formulated is that M does not 
have exactly the same significance with re- 
gard to the delay function in adolescents as 
in adults. This possibility becomes more co- 
gent in light of Litwin’s (1957) failure to re- 
peat important aspects of work on M and 
various forms of inhibition by Singer, Meltz- 
off, and others (Levine & Meltzoff, 1956; 
Meltzoff & Levine, 1954; Meltzoff & Litwin, 
1956: Meltzoff, Singer, & Korchin, 1953; 
Singer & Herman, 1954; Singer, Meltzoff, 
& Goldman, 1953; Singer & Opler, 1956; 
Singer & Spohn, 1954) when she studied in- 
hibition variables in younger age groups. 

The present outcomes and survey of past 
work emphasize the relationship between a 
personality variable, the delay function of the 
ego, measured by M, and general intelligence. 


’ 
‘ 
i 


It seems clear that the inhibition process bears 
a relationship to intelligence test performance. 
The answer to the question whether inhibition 
relates to the level of intellectual development 
in ways more subtle than simply making or 
not making gross errors remains for future 
research. Wiener (1957), also working in a 
personality oriented framework, demonstrated 
that the independently measured personality 
characteristic of “suspiciousness” was corre- 
lated with features of intelligence test per- 
formance and with the level of intelligence. 
Many have emphasized the closely entangled 
ties between personality and intelligence, but 
generally these have been treated as “non- 
intellective” factors of intelligence. By mak- 
ing use of the theoretical framework provided 
by the psychology of ego functions, as ex- 
emplified in the work of Rapaport, Gill, and 
Schafer (1945) and Fromm, Hartman, and 
Marschak (1954) with children, it may be 
possible eventually to relate concepts of in- 
telligence and intelligent behavior within a 
single theoretical framework. 


Summary 


A median value of .26 was found in a sur- 
vey of studies reporting M-IQ correlations. 
No studies were found dealing with atypical 
children, and two studies dealing with adult 
schizophrenics reported near zero correlations 
between M and IQ. 

The present study reports M-IQ correla- 
tions for a group of atypical children (.20), 
for a group of psychiatric outpatient veterans 
(.41), for psychiatrically hospitalized, non- 
schizophrenic veterans (.49), and for hos- 
pitalized schizophrenic veterans (.37). Data 
on the relationship between reversals of the 
mirror-image NV of the W-B digit symbol sub- 
test, Rorschach M, and IQ are also reported. 
In the group of hospitalized adults, it was 
shown that reversers and nonreversers dif- 
fered for both M and IQ. The findings were 
not fully confirmed in the group of atypical 
children. 

It is suggested that a theoretical position 
relating measures of intelligence to the psy- 
chology of ego functions may eventually pro- 
vide a framework to understand concepts of 
intelligence and personality in the same terms. 
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This study tries to answer the question: 
Can the Wechsler-Bellevue and the Rorschach 
tests predict the outcome of long-term inten- 
sive psychotherapy with very sick mental pa- 
tients? 

A number of empirical studies have been 
devoted to predicting response to psychother- 
apy (Windle, 1952) by means of psychologi- 
cal tests. In most cases, the patients were 
psychoneurotic, the psychotherapy was brief 
(rarely longer than six months), and length 
of stay in psychotherapy rather than outcome 
was the criterion. As yet there is no general 
agreement on the basic factors in prognosis. 

In a detailed study of the Rorschach, Sie- 
gel (1948) found only three prognostic signs. 
Dickson (1949) found more, but his signs did 
not agree with those of Siegel. 

Malamud and Gottlieb (1942) found that 
high Binet scores made it more likely that 
neurotics would benefit from psychotherapy, 
but the trend did not reach the .05 level of 
significance. Miles et al. (1951) found a simi- 
lar but significant result using the Wechsler- 
Bellevue. On the other hand, Dickson (1949) 
found no evidence that IQ was prognostic. 

These examples are characteristic of most 
studies. Windle (1952) reports some 30 stud- 
ies, covering a wide variety of therapies, in 
which cognitive tests were used as predictors. 
In one third of them, the intelligent patients 
were more likely to improve; in another third 
the opposite was true; the remaining third 
showed no significant relation. It seems then 
that no consistent relation has been demon- 
strated between intellectual level and the re- 
sults of psychiatric therapy. 

The present study was conducted at Chest- 


nut Lodge, Inc., a private hospital of about 
100 beds, devoted to psychotherapy. It is 
important to describe the milieu since it is 
unique. No shock or psychosurgical treat- 
ments are used at this hospital. Well over 
half the patients are schizophrenics, many of 
whom have been hospitalized for years be- 
fore coming to Chestnut Lodge. Socioeconomic 
status and intellectual level tend to be high. 
Every patient is seen regularly by an ana- 
lytically trained psychiatrist, usually four 
hours a week. The basic philosophy of the 
hospital includes the assumptions that pa- 
tients may require years of hospital treat- 
ment; that they probably will continue in 
therapy after leaving the hospital, often for 
several years; and that all patients, no mat- 
ter how deeply psychotic, can recover. 

The high economic status of the patients, 
the long intensive psychoanalytic therapy, 
and the general hospital milieu make this 
sample a homogeneous group of mental pa- 
tients of a type never studied before from a 
prognostic point of view. 


Method 
Subjects 


The Ss of this study were all the patients 
who met the following conditions: 

a. A Rorschach was given at Chestnut 
Lodge, or a Rorschach protocol taken within 
the six months prior to admission was avail- 
able. 

b. The patient was in treatment at Chest- 
nut Lodge at least two years following ad- 
ministration of the test. This is important 
because the main purpose of this study is to 
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answer the question whether tests have prog- 
nostic significance, and the prognosis must be 
made, of course, from the time of testing for- 
ward. 

c. The patient did not have any known 
brain damage. 

d. One of the patient’s treatment years 
postdated January 1, 1950, and all the re- 
quirements listed above were fulfilled by Oc- 
tober 1, 1955. No information following that 
date was used. 

Ninety-three patients (40 males and 53 fe- 
males) fulfilled all the conditions. The me- 
dian age of the males was 26 and of the fe- 
males, 28, with a range from 16 to 53. Table 1 
gives their distribution according to diagnostic 
category. 


Criterion 


The really difficult part of any prognostic 
study is to define precisely (a) what one is 
trying to predict, and (6) for what time one 
is trying to predict it. 

a. No attempt is made in this paper to pre- 
dict personality change per se, or any of the 
psychological conditions usually conceived of 
as desirable end-states in psychotherapy. Only 
the patient’s social adjustment, as shown by 
where he lived, was considered. The most im- 
portant distinction is whether the patient was 
in or out of a mental hospital. If he was in a 
hospital, was he on a disturbed ward or a 
convalescent ward? If he was out of a hos- 
pital, was he an outpatient, a private patient, 


Table 1 


Psychiatric Classification of Patients Used 
in This Study 


Diagnoses N 
Schizophrenia, paranoid type 43 
Schizophrenia, catatonic type 10 
Schizophrenia, hebephrenic type 6 
Schizophrenia, simple type 1 
Schizophrenia, other types 10 
Manic depressive psychosis 2 
Schizoid personality 1 
Psychopathic personality 2 
Personality disorder 2 
Psychoneurosis 16 
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or was no treatment needed? Changes from 
one such status to another constituted our 
criterion of improvement. Graphs were made 
for each patient showing his status for each 
day using the following status scores: 


: Disturbed ward 

: Semi-disturbed ward 
Convalescent ward 

: Outpatient 

: Private patient 

: No treatment needed 


6b. Whether a patient seems to do well or 
poorly depends upon the period of time over 
which he is observed. The term “final out- 
come’’ is either meaningless or misleading in 
psychiatry. Does it refer to the patient’s state 
of mental health at the time of his death? 
Any other point in his life has to be chosen 
arbitrarily and is often followed as well as 
preceded by considerable change for the bet- 
ter or for the worse. A study of the results of 
therapy should, therefore, include a descrip- 
tion of the patient’s course for a stated pe- 
riod of time and whether it is upward or 
downward, or upward and downward. Clearly, 
no simple function of the status scores can 
represent adequately a patient’s course. There- 
fore, 11 categories were set up which formed 
the social adjustment criterion. 

Category 11 denotes the best outcome; 
Category 1 denotes the worst. (The numbers 
in brackets refer to the number of patients in 
the category.) 


CO 


11. All scores between 0 and 1, i.e., patient was never 
hospitalized [2]. 

10. Scores begin at 3 or more and are down to 1 or 
less by the end of 1 year, ie., the patient starts 
in the hospital, improves consistently and is out 
of the hospital by the end of 1 year [9]. 

9. Scores begin at 3 or more, are down to 1 or less 
by the end of 1 year, as in Category 10, but the 
decrease is inconsistent, up and down [0]. 

8. Scores begin at 3 or more, are down to 1 or less 

between the end of the first and the end of the 

third years, i.e., the patient starts in the hospital, 
improves consistently, but more slowly and is out 

of the hospital by the end of the third year [11]. 

. Scores begin at 3 or more, are down to 1 or less 

between the end of the first and the end of the 

third year, as in Category 8, but the decrease is 

inconsistent, up and down [6]. 

6. Scores begin at 3 or more and are down to 1 or 
less after the end of 3 years, ie., the patient starts 

in the hospital, improves consistently but very 
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slowly and is not out of the hospital until after 
the end of 3 years [3]. 

. Scores begin at 3 or more and are down to 1 or 
less after the end of three years, as in Category 6 
but the decrease is inconsistent, up and down 
[10]. 

. Scores begin at 3 or more; there is no steady de- 
crease, but the monthly averages are 1.6 or less 
for at least 3 months at the end, i., patient’s 
course is inconsistent, but there is a trend to- 
ward improvement at the end of the period [7]. 

3. Scores begin at 3 or more. There is no steady 
decrease, i.e., patient’s course is inconsistent with 
no trend toward improvement at the end of the 
period [22]. 

. Scores do not go below 3, but rarely go higher, 
ie., the patient remains consistently hospitalized 
but chiefly on a convalescent ward [7]. 

. Scores do not go below 3 and are for the most 
part higher than 3, ie., the patient remains con- 
sistently hospitalized, chiefly on a disturbed ward 
[16]. 


Three psychologists made independent clas- 
sifications with almost perfect agreement. 
Ninety-five per cent of all patients were 
placed in the same category by all three 
raters." 

Reliable information on the patient was 
available for periods of two to nine years. 
This is not necessarily all time in treatment, 
but simply time over which there is reliable 
information. The patients were divided into 
four groups according to the length of time 
during which information was available about 
them after the testing date. (The distribution 
is given in Table 3.) 


Prediction Measures 


Wechsler-Bellevue Aduit Scale of Intelli- 
gence. A Wechsler-Bellevue, Form I or II, 
had been administered to 90 of the 93 Ss 
close to the time of the administration of the 
Rorschach. Only the Full Scale IQ, calculated 
on the basis of at least 10 subtests, was used 
in this study. 

Rorschach. Preliminary studies on the cor- 
relation of particular Rorschach factors with 
the patients’ clinical courses were unpromis- 
ing. The senior author, therefore, developed 
the following broad descriptive statements 
about good signs and poor signs in the test: 


! The authors are indebted to Harold L. Williams, 
Chief of the Department of Clinical and Social Psy- 
chology, Walter Reed Army Institute of Research, 
for help in this classification. 


Good Signs 


1. Patient goes at the test competently 
be shown in a variety of ways, e.g.: 


This may 


a. He gives a large number of R which are not 
simply empty perseverations or repetitions of 
his preoccupations, but are of good quality. 

. He does not hesitate a great deal, but is fairly 
speedy in his productions. 

. He speaks as if he knows what he is talking 
about. 

. He shows interest, alertness, and/or enjoys the 
task, and/or shows some humor (not of the 
unconscious variety). 


2. The patient’s test does not show clear evidence 
of manifest psychosis. It might he diagnosed neu- 
rosis (hysteric or obsessional), schizoid, or latent 
schizophrenia. Mild psychosis is in between good 
and bad. 


a. As a corollary of this there is a good number 
of P. 

3. The content is lively, imaginative, varied, origi- 
nal. Dysphoric, morbid content is not the best but is 
much better than dull content. 

4. There is good movement, color, shading, but 
particularly good M. 

5. The patient seems to be intelligent and able to 
function intelligently at present. This may seem to 
be the same as the points above, but any additional 
evidence of it is a good sign. 


Poor Signs 


1. Patient shows signs of incompetence in fulfilling 
the task. This may be shown in a variety of ways, 
eg.: 

. He gives a meager quantity of R. 
. He rejects one or more cards. 

. He takes an inordinately long time. 

. He expresses or shows extreme fatigue, disin- 
terest, restlessness or the like. 

He is very uncertain of what he perceives. Try 
to distinguish this from the neurotic’s uncer- 
tainty about whether he is pleasing the ex- 
aminer. 

2. The patient shows clear signs of overt psychosis 
This is especially bad if the signs are so clear that a 
person who is not a Rorschach expert, reading the 
protocol, could identify it as a psychotic production 
simply from the verbalizations. It is also bad if the 
Rorschach worker finds the diagnosis of overt psy- 
chosis easy to make on a casual skimming of the 
protocol. The easier the diagnosis, the worse the 
prognosis. 


a. As a corollary of this there is low P 


3. The content of the record is dull, empty, 
petitive. 

4. There is little movement, color, or shading 

5. The patient seems to be either of poor intellec- 
tual endowment or shows at present gross intellec- 
tual impairment. 
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SOCIAL ADJUSTMENT VS W-B IQ 
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The statements were given to two experi- 
enced Rorschach workers,’ together with the 
Rorschach protocols of the 19 patients in 
Group D (7-9 years of follow-up). They had 
no information about the patients other than 
age and sex. When it was found that their in- 
dependent global ratings had substantial cor- 
relations with each other and with the social 
adjustment criterion, all 93 Rorschach pro- 
tocols were turned over to a competent but 
relatively inexperienced clinical psychologist.* 
Eight protocols were used for practice and 
training, leaving a total of 85 protocols for 
scoring according to the “good and poor 
signs.” A high score meant a predominance 
of good signs and vice versa. 


Results 


Figure 1 shows the relation of social adjust- 
ment to the W-B intelligence score. Clearly, it 
is not a bivariate normal distribution, but is 
the triangular or fan shape frequently met 
in psychological studies. Although not given 
here, the plot of social adjustment against the 

2The authors are indebted to Margaret Thaler 
Singer and Nathene Loveland, Department of Clini- 
cal and Social Psychology, Walter Reed Army Insti- 
tute of Research. 

3 The authors are indebted to Eugene Stammeyer, 
Department of Psychology, St. Elizabeth’s Hospital. 


Rorschach rankings showed the same hetero- 
scedasticity. Figure 1 shows clearly that ac- 
curacy of prediction from Wechsler IQ in- 
creases as the IQ decreases. For Ss with an 
earned IQ higher than 110, the predictions of 
social adjustment are no more accurate than 
would be obtained by chance. 

For the males, the correlation of W-B IQ 
with social adjustment is .50, and for the fe- 
males is .55. For the total group the correla- 
tion is .52. Because of heteroscedasticity, the 
usual tests of significance were not made. 
Various rank-order and chi-square tests showed 
that the relation was significant. 


Table 2 


Intercorrelations of Social Adjustment, Rorschach Rat- 
ings, and Intelligence for Patients with 7-9 Years 
of Treatment Plus Follow-up Time 


N 


Ratings by Rorschach workers 


Experienced 
Experienced 
Inexperienced 


Wechsler-Bellevue IQ 
Social adjust ment criterion 
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Table 3 


Intercorrelations of Social Adjustment, Intelligence, and Rorschach Rating by Follow-up Group 


Treatment Plus 
Group Follow-up Time 


A 
B 


7-9 Years 


Wechsler IQ 
Mean Variance 
112 408 
119 67 
105 442 
113 246 


Note.—c is the social adjustment criterion, e is the Rorschach rating, and w is the Wechsler 1Q; Re ew is the multiple correlation of 


the criterion with the Rorschach and Wechsler. 


A premorbid W-B IQ was estimated on the 
basis of the W-B protocol. A deficit score was 
computed by subtracting the earned IQ from 
the estimated IQ. This index of deterioration 
had a correlation of about .50 with social ad- 
justment, the same as the earned IQ. 

Table 2 gives the results of applying the 
Rorschach global rating, described previously, 
to Group D (7-9 years of treatment plus 
follow-up time). 

It seems that the two experienced Ror- 
schach workers produce higher correlations, 
not only with each other but also with the 
Wechsler IQ, than did the inexperienced cli- 
nician. 

Table 3 tries to answer several questions. 
Does the predictability of the criterion change 
as follow-up time increases? Will joint use of 
the Rorschach rating and Wechsler IQ im- 
prove prediction over use as individual scores? 
There seems to be no systematic change in 
predictability as follow-up time increases. The 
zero-order correlations make it clear that the 
Rorschach and the Wechsler predict with al- 
most equal accuracy. The multiple correla- 
tions indicate that a linear combination does 
not improve the prognosis significantly. 

In Table 3, Group B (3-4 years of treat- 
ment plus follow-up time) obviously differs 
from the others in that all correlations are 
near zero. But essentially this situation does 
not differ from that in other groups, for in 
Group B there is just one S with an IQ lower 
than 110, and Fig. 1 shows that it is only for 
such Ss that accurate prediction can be made. 
The lowered variance on the Wechsler thus 


seems to account for the near zero correla- 
tions. 
Discussion 

The most striking and practically useful 
finding in this study is that the tests, as used 
here, do not predict social adjustment for pa- 
tients who do well on the psychological tests, 
but do predict with virtual certainty for those 
who do poorly. Whether this poor perform- 
ance is a function of inferior native endow- 
ment or the result of so-called psychotic de- 
terioration or a combination of the two, is a 
question which cannot be answered from this 
study. 

In practice, the clinician utilizing long-term 
intensive psychotherapy with very sick men- 
tal patients will be confronted with two types: 
(a) those whose intellectual functioning is 
relatively intact; who approach a task in- 
volving creative imagination with alert com- 
petence; and who, if they are psychotic, still 
maintain “clear” areas, and (6) those who 
show obvious intellectual impairment and ob- 
vious all-pervasive psychosis, whose produc- 
tions are meager, dull, and repetitive. The pa- 
tients in the first group may or may not 
recover. The factors which lead to recovery 
could not be identified in this study. 

The patients in the second group did not 
recover. The possibility cannot be excluded 
that continuing treatment over a still longer 
period of years may produce better results, 
but the conclusion remains that such results 
occur only when the effort is enormous and 
the case is exceptional. All of these patients 
were schizophrenics; their characteristics are 


317 

2 Years 17 51 .26 54 

3-4 Years 18 —.05 —.16 —.04 

4 5-6 Years 31 A4 65 64 65 

7 D Pe 19 75 69 52 83 
85 
i 
y 


318 


like those which have been described in the 
literature as process schizophrenia. The cli- 
nician must ask himself whether psychother- 
apy, as it is now conceived, is really the 
method of choice with these patients since 
the results over such a long period of time 
are so meager. 

Since no consistent relation has been dem- 
onstrated between intellectual level and out- 
come when all types of therapy are considered 
together (Windle, 1952), the high prognostic 
accuracy in this study for patients with low 
intelligence is probably linked to the kind of 
patient, the kind of milieu, and the kind of 
psychotherapy administered. 


Summary 
Ninety-three patients undergoing intensive 
psychotherapy for at least two years were 
given psychological tests and followed for ad- 
ditional periods up to seven years. Either the 
Wechsler-Bellevue or the Rorschach could be 
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used to predict the patient’s course with a 
validity of about .50; combining the two did 
not improve prediction. A prognosis of non- 
improvement could be made with virtual cer- 
tainty on the basis of poor test performance, 
but good test performance was not prognostic. 


Received May 28, 1958. 
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In recent years there has been a change in 
the clinical usage of the Minnesota Multi- 
phasic Personality Inventory (MMPI) va- 
lidity scales. Since the advent of pattern in- 
terpretation, the original use of the L—-F-K 
scales as indicators of test-taking attitudes 
has broadened to deal with such personality 
variables as self-concept, reality-testing, ade- 
quacy of social behavior, degree of behavioral 
disturbance, and general adjustment mecha- 
nisms. 

Examination of data from various studies 
suggests that F is positively related to behav- 
ioral disorganization (Gough, 1946; Schmidt, 
1945). Schiele and Brozeck (1948) report that 
F tends to increase with severity of disturb- 
ance following semistarvation. Gough (1950) 
indicates that Ss attempting to present them- 
selves as disturbed have a high F and low K, 
while Ss told to make a “good impression” 
have a high K and L. Dragow and Barnette 
(1957) report that Ss motivated to indicate 
their capabilities as employees worthy of ad- 
vancement had a high K and low F. 

The purpose of this study is to investigate 
the relationships among the validity scales 
with regard to some personality variables. 

Since this study is exploratory, only two 
basic hypotheses were made, utilizing current 
clinical practice as well as the original logic 
of the scales: 

1. Ss relatively nonbehaviorally disturbed 
have L > F, K > F (a “V” shape). 


1 The author wishes to acknowledge his indebted- 
ness to William J. Eichman for his aid in planning 
the study and to Burke M. Smith, Earl G. Guyer, 
and Roy A. Eck for their critical reading of the 
manuscript. Thanks also to the Medical Illustration 
Service for making the graphs 


2. Ss severely behaviorally disturbed have 
F>L,F > K (a “caret” shape). 


Method 


A group of 100 newly admitted psychiatric 
patients, 50 males and 50 females with a pri- 
mary diagnosis of schizophrenia, who were 
given the MMPI during the first month after 
admission, were selected as Ss. Patients over 
60 years of age or with a history of organic 
brain damage were excluded. 

The mean age of the male Ss was 33 years, 
with a range from 22 to 55. The mean age of 
the female Ss was 34 years, with a range from 
21 to 51. The mean education of the male Ss 
was 11.6 years and of the females 11.8. For 
the males, the subclassifications of schizo- 
phrenic reactions were: 1 catatonic, 20 para- 
noid, 1 affective, 28 undifferentiated. For the 
females the subclassifications of schizophrenic 
reactions were: 2 catatonic, 1 hebephrenic, 
18 paranoid, 4 affective, 25 undifferentiated. 
There were no significant differences in any 
of the above between sex groups. 

The Ss were rated by the investigator on 
scales of behavioral disturbance (BD) and 
social adjustment (SA).° The scales suffer 
from the fact that they had to be formulated 
to deal with existing clinical record data. BD 
was determined by rating the S as —1, 0, or 
+1 on 9 categories with a score of —1 being 
in the disturbed direction. The general cate- 
gories were ward adjustment, obvious behav- 
ioral manifestations of psychosis, and therapy 


2A table giving the complete scales has been de- 
posited with the American Documentation Institute 
Order Document No. 5888, remitting $1.25 for micro 


film or $1.25 for photocopies 
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Table 1 


Analysis of Variance of L~F—K Relationships with 
Mild and Severe Behavioral Disturbances and 


Mean 
Source df Square 
Between Subjects 99 102.59 
Behavioral disturbances 1 216.75 
Sex 1 234.47 
BD XS 1 99.38 
Error (b) 96 100.51 
Within Subjects 200 133.36 
Validity scales 2 617.10* 
Vx BD 2 1,418.57** 
Vxs 2 27.43 
VXBDxXS 2 214.33 
Error (w) 192 115.18 
299 


Total 


* F significant at .01. 
** F significant at .001. 


assignment adjustment for the 30 days fol- 
lowing the administration of the MMPI. The 
SA criterion was found by similar ratings on 
9 categories dealing with educational, marital, 
and financial adequacy, family background, 
and past history of emotional disturbances. 
The pluses and minuses were summed and a 
distribution of the final scores on BD and on 
SA were formed. Each distribution was then 
divided at the mean, the lower scores falling 
into the “severe BD” category and the “poor 
SA” category respectively. The higher scores 
were placed into the “mild BD” category and 
the “good SA” category, respectively. 

The scores on the BD and SA scales were 
compared with the scores on the validity 
scales of the MMPI. The statistical designs 
used were Type III three-dimensional and 


K 
7 
0 ~—— Severe BD. M+F 


~--=Mild BD. M+F 


Fig. 1. Mild and severe BD groups on the MMPI 
validity scales. 


Leonard R. Gross 


Table 2 


Analysis of Variance of L-F-K Relationships with 
Good and Poor Social Adjustment and with 
Male and Female Schizophrenics 


Mean 

Source df Square 

Between Subjects 99 102.59 

Social Adjust ment 1 298.01 

Sex 1 234.47 

SAXS 1 42.17 

Error (b) 96 100.24 

Within Subjects 200 133.36 
Validity Scales 2 617.10* 

VXSA 2 50.08 

Vxs 2 27.43 
VXSAXS 2 971.20** 

Error (w) 192 121.85 

Total 299 


* F significant at .01. 
** F significant at .001. 


Type I two-dimensional analysis of variance 
designs as recommended by Lindquist (1956). 


Results 


A three-dimensional analysis of variance 
was done which included the variables of 
(a) mild and severe BD, (5) Sex, and (c) 
the Validity scales. The results of this analy- 
sis are presented in Table 1. In this design, 
the variables of BD and Sex, taken alone, 
have no psychological meaning, since the 
scores dealt with are summations of the T 
scores across L—K-—F scales. Likewise, there is 
little interest in the effect of the Validity 
scales taken alone. In this case, the Validity 
scale’s effect is significant, and further analy- 
sis indicates that for the total sample F > L 


| | 


Fig. 2. Good and poor SA groups on the MMPI 
validity scales. 
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Fig. 3. SA and BD profile combinations on the 


MMPI validity scales. 


or K. This is to be expected in a schizo- 
phrenic sample. Prime interest lies in the in- 
teractions of the Validity scales with Sex and 
with BD or with both. In this analysis the 
triple interaction is not significant, and there 
are no significant interactions with the sex 
of the Ss. The difference between mild and 
severe BD for the Validity scales was sig- 
nificant beyond the .001 level. Figure 1 rep- 
resents this significant interaction. Hypothe- 
sis I, that Ss with mild BD have L > F, 
K > F, is confirmed. Hypothesis II, that Ss 
with severe BD have F > L, F > K, is also 
confirmed. 

An identical analysis of variance design for 
SA and the three validity scales also yields 
two significant F ratios, one for the validity 
scales beyond .01 as in BD and the other a 
triple interaction among the validity scales, 
SA and sex, which was significant beyond the 
.001 level. Table 2 presents the summary 


Table 3 


Profile Combinations (see Fig. 3) Significant 
Beyond the .05 Level 


D* 


*A = Good SA “3 mild BD, M +F; B = Poor SA and 
mild BD, M +F; = Poor SA - severe BD, M +F; 
D = Good SA and severe BD, M; E = Good SA and severe 


BD, F 


70 


D.M+F 
D.M+F 


Good SA + M 
t-——Poor SA +Severe BD.M+ 
40° Good SA + Severe BD, a a 


Fig. 4. Summary profiles of significant SA and BD 
combinations on the MMPI validity scales. 


table and Fig. 2 represents the significant in- 
teraction. As can be seen, male Ss with good 
SA have a similar configuration to female Ss 
with poor SA. 

Because of the discrepancy between males 
and females on SA, two-dimensional analyses 
were run between SA and BD for the two 
sexes in order to ascertain whether or not 
males and females could be combined for any 
BD, SA combination. The four pairs of pro- 
files compared for sex differences were good 
SA and mild BD, good SA and severe BD, 
poor SA and mild BD, poor SA and severe 
BD. This was possible to do since product- 
moment correlations between BD and SA for 
the males and females yielded an r of .14 
and .07, respectively, indicating that the two 
scales were measuring different variables. The 
only difference found between the two sexes 
was on good SA and severe BD, significant 
beyond the .001 level. Therefore the eight 
profiles could be reduced to five, three in 
which males and females had similar pro- 
files, and two in which the males and females 
had dissimilar configurations. The five profiles 
are diagramed in Fig. 3. 

Two-dimensional analyses were then made 
between all combinations of the five profiles 
in Fig. 3 to determine if any of them differed 
significantly from any other. Table 3 indi- 
cates the results of the 10 analyses. On the 
basis of these analyses, Profiles A, B, and D 
in Fig. 3 were combined, leaving three dis- 
tinct profiles as in Fig. 4. These were Profiles 
C, E, and a category in which A, B, and D 
belong. 
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Discussion 


It can be suspected that the small m for 
several of the profiles in Fig. 3 contributed 
to their lack of statistical stability. 

The three significant profiles generally in- 
dicate that the F scale is in large part a posi- 
tive function of behavioral disorganization. 
Thus a high F and low L may indicate overt 
expressions of pathological feelings as the 
relatively high F of delinquents in Rempel’s 
(1958) study suggests, rather than dissem- 
bling or test-taking attitude as such. 

The poor SA and severe BD males and fe- 
males have a high F with a lower L and K, 
although both the latter two are above the 
mean. A clinical interpretation supported by 
the data would be that an individual with 
such a configuration is attempting to present 
himself as socially acceptable, conforming and 
adequate (L) and is denying social difficulties 
as well as attempting to utilize repressive de- 
vices (K). Because of severe and chronic pa- 
thology he cannot separate normal from bi- 
zarre responses, hence the high F. In short, 
this would be a chronically disturbed indi- 
vidual who is attempting both consciously 
and unconsciously to be defensive but is in- 
capable of doing so. 

In contrast with this profile is the good SA 
and severe BD female Ss who have an L and 
K below the mean as well as a high F. This 
group can be considered acutely disturbed as 
they have had good social adjustment in the 
past. They are blatantly admitting to pa- 
thology and revealing a poor self-concept, 
lability, and dysphoria. Thus it would be ex- 
pected that female patients who have made 
a somewhat adequate social adjustment in 
the past are less likely to be defensive and 
guarded regarding their current disturbance 
on either a conscious or unconscious level. 

The other profile, a combination of A, B, 
and D, is composed of three groups that score 
in the more healthy direction on one or both 
of the criterion scales. This group seems to 
have some assets that do not exist in the poor 
SA and severe BD group. It can be seen from 
Fig. 3, Profile A, that the good SA and mild 
BD group most nearly represents a straight 
line. They have been able to maintain some 
balance between self-concept, a conscious de- 
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sire to perform adequately, and overt be- 
havior. Their attempt to present themselves 
as adequate, nondisturbed, and conforming to 
social expectations is somewhat successful. 

It would appear from inspection of the poor 
SA groups that they tend to have a relatively 
high Z and K. Their reliance on denial of ill- 
ness, inadequate repressive mechanisms, as 
well as their rigidity, would indicate poor in- 
terpersonal relations contributing to poor so- 
cial behavior. 

One of the most striking differences sta- 
tistically is between the good SA and severe 
BD males and females. The females, as 
pointed out before, are admitting freely to 
pathology and acting out their symptoma- 
tology. The males on the other hand present 
a picture of both conscious (L) and uncon- 
scious (K) denial and constriction of behav- 
ior. Perhaps males that become acutely dis- 
turbed tend towards withdrawal and intra- 
tensive behavior, while females tend toward 
a more overtly emotional behavior pattern. 

It seems worth emphasizing that even 
within a relatively homogeneous diagnostic 
group the three “validity” scales differentiated 
Ss on crude a priori social-behavioral criteria 
and can be considered personality measures. 

Because this study is exploratory, it per- 
haps raises more questions than it answers. 
While it is hoped that these three MMPI 
scales will be considered as representing im- 
portant variables and be used as an index of 
adjustment mechanisms and overt behavior 
as well as “validity” scales per se, this study 
has only hinted at two possible dimensions. 
The sample, as noted before, was not ade- 
quate to explore fully what differences, if any, 
exist among the three profiles in Fig. 3, which 
were grouped together. The BD and SA scales 
were operationally defined and based on exist- 
ing clinical records. Hence, they were limited 
by that data which was common to all pa- 
tients. 

Summary 


At the Roanoke VA Hospital, 50 male and 
50 female schizophrenic patients were rated 
on a priori scales of behavioral disturbance 
(BD) and social adjustment (SA) from their 
clinical records. They were divided into poor 
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and good SA, mild and severe BD, and then 
compared on the L-F-K scales of the MMPI. 
It was found that Ss with severe BD had a 
“caret”-shaped L—F-K profile, while Ss with 
little BD had a “V”-shaped profile. When BD 
was combined with SA, three distinct profiles 
emerged, indicating that the “validity” scales 
seem to measure personality characteristics. 


Received June 2, 1958. 
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This experiment is concerned with the ef- 
fects of anxiety, stress instructions, and diffi- 
culty of stimulus material. An attempt is made 
to extend the theory and empirical findings 
regarding these variables to performance on 
projective devices. 

Experiments concerning the effect of stress- 
ful situations and subjects’ anxiety level on 
learning and performance have been reviewed 
by Lazarus, Deese, and Osler (1952), Child 
(1954), and Taylor (1956). It has been gen- 
erally found that when the level of difficulty 
within a task is varied, Ss’ anxiety level does 
not hinder and sometimes aides functioning 
on simpler aspects of the task used. It has 
also been found that Ss’ anxiety and stress 
instructions interact so that high test anxious 
(HA) Ss are more impaired in stress situa- 
tions than low test anxious (LA) Ss. 

The theory used here to understand these 
findings has been stated by Mandler and 
Sarason (1952), Sarason, Mandler, and Craig- 
hill (1952), and Child (1954). Briefly the 
theory may be outlined as follows: 

1. HA Ss tend to react to test situations 
with achievement-related, anxiety-reducing re- 
sponses. Anxiety-reducing responses are largely 
task inappropriate. They may be self-relevant, 
aggressive, or in other ways competing and 
incompatible with the tasks. To this extent, 
they impede performance. 

2. LA Ss tend to react to test situations 
primarily with task appropriate responses. LA 
Ss do not tend to make inappropriate anxiety- 
reducing responses in a test situation. 

3a. When stress is introduced via ego-in- 
volving instructions, expectations of failure 


1This article is based upon a dissertation sub- 
mitted to Yale University in partial fulfillment of 
the requirement for the degree of doctor of philoso- 
phy. The author would like to acknowledge S. B. 
Sarason’s contribution and encouragement. 
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and punishment are aroused along with 
achievement motivation. As a drive-produc- 
ing cue, stress instructions serve to elicit re- 
sponses which a given S has previously learned 
to make to stressful test situations. HA Ss re- 
spond with anxiety, related to the achieve- 
ment motive in conflict with expected failure 
and punishment. Thus, HA Ss in a stress 
situation tend to have greater anxiety and 
strengthened anxiety-reducing responses. 

3b. LA Ss have their prepotent mode of 
responding to a test situation strengthened by 
stress. For such Ss, stress does not serve as 
a cue for the introduction of anxiety and 
anxiety-reducing responses, but to increase 
achievement motivation and strengthen task 
appropriate responses. 

4. Increasing the difficulty of a problem 
produces a conflict situation. Difficulty, as 
here used, has been defined by Farber and 
Spence (1953) as a function of the number 
of incompatible and competing response tend- 
encies S must choose among in performing on 
a given unit of a task. 

4a. HA Ss react to difficult, conflict-pro- 
ducing test situations with anxiety-reducing 
reactions. 

4b. Increasing task complexity for LA Ss 
does not induce anxiety and anxiety-reducing 
responses. 

Assumptions 4, 4a, and 4b lead to the pre- 
diction that Ss’ anxiety, stress instructions, 
and task complexity interact. HA Ss com- 
pared to LA Ss should therefore show greater 
impairment as test materials become more 
difficult, and this effect should be increased 
by stress instructions. This prediction was 
tested on a motor learning task, a Rorschach- 
like perception problem, and a word associa- 
tion test. 

Siipola (1950) and Siipola, Kuns, and Tay- 
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lor (1950) experimented with the effect of 
color on Rorschach reactions. It was found 
that forms which are colored compatibly with 
the most frequently seen percept elicit rapid 
reaction times compared to incompatibly 
colored forms. Where color and form are in- 
compatible, S is faced with competing re- 
sponse tendencies leading to longer reaction 
times. The Rorschach situation is thus con- 
ceived by Siipola as containing a “difficulty” 
continuum. A more difficult Rorschach prob- 
lem is one which sets competing and incom- 
patible response tendencies into operation. It 
is predicted that HA Ss in comparison to 
LA Ss respond to Rorschach stimuli with a 
greater difference in reaction time between 
compatibly (less difficult) and incompatibly 
(more difficult) colored blots. This effect will 
be augmented by stress instructions. 

Similarly, data presented by Rapaport 
(1946, pp. 44, 55) indicate that stimulus 
words associated with longer response times 
tend to elicit comparatively more response 
words than stimulus words associated with 
relatively quick reaction times. These data 
suggest that the word association test con- 
tains a difficulty continuum. More difficult 
words are those which elicit relatively many 
different responses and are associated with 
longer reaction times. It is predicted that HA 
Ss compared to LA Ss will have a greater dif- 
ference in reaction time between less difficult 
and more difficult words on the word asso- 
ciation test. This effect will be increased by 
stress instructions. 


Method 


‘Four groups of 10 Ss each were used. Two 
of these groups were extreme HA Ss, and two 
were extreme LA Ss as determined by the 
Sarason test anxiety scale. One group of HA 
Ss and one group of LA Ss were given task in- 
structions designed to induce stress. The re- 
maining two groups were given neutral in- 
structions. All Ss were presented with the 
following tasks in the order listed: (a) a 
motor learning task, (6) a free association 
word list, and (c) a Rorschach-like percep- 
tual task. 


Tasks 


Motor learning experiment. A “bell maze” 
was constructed of eight wooden blocks 
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mounted on a baseboard. Each of these 
wooden blocks had a different number of 
ordinary doorbell buzzers mounted on it. The 
number of buzzers on each block varied from 
two to nine. Each S was required to proceed 
from block to block and to learn the correct 
buzzer on each block. The correct buzzer was 
the one which rang when pressed. Thus, S 
was required to learn the correct path through 
a series of eight blocks. The problem con- 
tained three different correct paths, only one 
of which was activated at a given trial. After 
three practice trials, S was given 18 trials 
(six trials with each path) in which to learn 
the task. S was told which path he was to at- 
tempt before each trial. The number of errors 
made on each block on every trial was re- 
corded. 

Word association experiment. Twenty rela- 
tively “nonemotional” words were culled from 
Kent and Rosanoff's (1927) and Rapaport’s 
word association lists. Ten of these words 
were extremely highly associated with a single 
response word (E words). Ten were extremely 
low with respect to this variable (D words). 
These two groups of words were matched for 
frequency in written language according to 
the Thorndike word count (1921). Ss were 
required to state the first association which 
came to mind. Reaction time and the associa- 
tions elicited were recorded. Data concerning 
the word list appear in Wiener (1954, Ap- 
pendix B). 

Perception experiment. Thirteen ink blots 
were used. Seven blots were colored com- 
patibly with the most frequently elicited re- 
sponse (E blots). Six blots were colored in- 
compatibly with the most frequently elicited 
response (D blots). Eleven of the 13 blots 
used were details taken from the standard 
Rorschach blots. Of these, nine were used by 
Siipola (1950). The blots, their source, and 
data concerning associated responses appear 
in Wiener (1954, Appendix B). 


Instructions 


Ss given stress instructions were led to be- 
lieve that their intelligence, abilities, and per- 
sonalities were being intensively studied and 
that data obtained from this experiment would 
be compared with their school records. In the 
initially presented motor learning task, Ss 
were told that impossible-to-attain scores were 
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the norm for students who have an average 
adjustment. These false norms were rein- 
stated during the course of the experiment (at 
the sixth and twelfth learning trials) in an 
attempt to increase stress as the experiment 
progressed. None of the stress Ss attained the 
suggested norms. 

Nonstress Ss were told that the task mate- 
rials were being pretested and that the re- 
sults in no way bore any relation to their 
abilities. Before the motor learning task was 
presented, nonstress Ss were correctly told 
that the problem was impossible to master 
and that they were not expected to do so, but 
that they should do their best. 


Subjects 


All 40 Ss were male students in an intro- 
ductory undergraduate course in psychology. 
They were given the Sarason test anxiety 
schedule about three months previously. None 
of the Ss verbalized any suspected connec- 
tion between the anxiety schedule and this 
experiment. HA Ss were in the top 10% of 
the distribution. LA Ss were within the lower 
10%. A total of 432 Ss comprised the popu- 
lation of the test anxiety scores. Ss were sup- 
plied to the experimenter so that he did not 
know whether a given S belonged to the HA 
or LA category. 


Results 


Motor learning experiment. It was predicted 
that anxiety level, stress instructions, and diffi- 
culty interact: HA Ss compared to LA Ss 
make more errors on the more difficult part 
of the problem than on the less difficult part. 
This effect should be greater under stress in- 
structions than under nonstress instructions. 

Table 1 indicates that there is no evidence 
that HA nonstress Ss, compared to LA non- 
stress Ss, make significantly more errors on 
D compared to E aspects of the task (¢ < 
1.).2 HA stress Ss compared to LA stress Ss 
are more impaired on the more difficult as- 
pects of the test (¢ = 2.08, 18 df, p = .03). 
The interaction among anxiety level, stress 


“The statistical procedures used in testing hy- 
pothesis involving the motor learning task are out- 
lined and discussed by Wiener (1954, Appendix D). 
Analysis of variance or mean difference statistics 
were not used because of scaling problems associated 
with different levels of difficulty. 


Gerald 


Wiener 


instructions, and difficulty level is significant 
only at the .12 level (¢ = 1.11, 36 df). 

Table 1 also indicates that LA stress Ss 
compared to LA nonstress Ss make fewer 
errors on the more difficult part of the task 
(t= 3.69, 18 df, p< .01). This finding 
considered alone would suggest that stress- 
ful situations are associated with heightened 
achievement motivation in LA Ss. However, 
data from Table 1 show also that HA stress 
Ss compared to HA nonstress Ss make fewer 
errors on the more difficult part of the prob- 
lem (¢ = 3.23, 18 df, p< .01, two-tailed 
test of significance). This unexpected finding 
may indicate that when all 18 learning trials 
are considered, the enhancing effect of in- 
creased motivation to do well obscures any 
impairing effect of the interfering responses. 

In summary, HA stress Ss compared to LA 
stress Ss are impaired more as difficulty in- 
creases. This suggests that interfering anxiety- 
reducing responses hinder performances as a 
function of difficulty. LA stress Ss compared 
to LA nonstress Ss made fewer errors as diffi- 
culty increased. This implies that the effect of 
increased motivation, without corresponding 
interfering responses, is to perform better as 
difficulty increases. However, HA stress Ss 
compared to HA nonstress Ss were more effi- 
cient on difficult parts of the task. It is pos- 
sible that HA stress Ss experienced both im- 
pairing anxiety as well as heightened motiva- 
tion to do well. 

Perception experiment. It was predicted 
that instructions, Ss’ anxiety level, and diffi- 
culty level interact so that (a) HA stress Ss 
compared to LA stress Ss take longer in re- 
acting to D than to E blots, and (6) this 
effect is greater under stress than under non- 
stress instructions. Examination of the D-E 
difference in Table 2 shows that this inter- 


Table 1 


Mean Errors on Simple (E) and Difficult (D) Aspects 
of the Motor Learning Problem 


Stress Nonstress 

E D E D 
HA 63.9 161.8 9.2 180.5 
LA 49 5 122.2 52 156.1 
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Table 2 
Mean Reaction Times in Seconds to 
E and D Ink Blots 


Stress Nonstress 


LA HA LA 


E blots 
D blots 
D—E Difference 


440 6.74 5. 3.88 
8.20 7.25 5.35 
3.80 51 84 1.37 
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action obtains in fact (F = 5.03, 1 and 36 df, 
p= .02).8 

HA stress Ss compared to LA stress Ss 
have longer reaction times to D than to E 
blots (¢ = 2.20, 18 df, p= .02). Also, HA 
stress Ss compared to HA nonstress Ss have 
longer reaction times to D than E blots (¢ = 
2.48, 18 df, p = .01). 

Word association experiment. Differential 
reaction times to D and E words did not 
prove to be a useful response measure. None 
of the comparisons among groups approached 
significance, although all Ss responded with 
greater latency to D than E words (¢ = 3.70, 
39 df, p< .01). S’s anxiety level and stress 
instructions did not appear to affect differen- 
tial reaction times. In view of this, the degree 
of the disturbance of the associated word 
was utilized to reflect impaired functioning. 
Using Rapaport’s criteria and Janis’s (1947) 
restatement of them, each S was scored for 
the percentage of disturbance on E and D 
words. All types of disturbances were grouped 
together. The data are summarized in Table 3. 

HA Ss compared to LA Ss respond with 
more disturbed reactions to D compared with 
E words (F = 4.96, 1 and 36 df, p = .02). 
HA stress Ss compared to LA stress Ss have 
more disturbed associations to D words (t = 
2.89, 18 df, p = .01). This effect does not ob- 
tain in nonstress situations (¢ < 1). The in- 
teraction between anxiety, instructions, and 
the E-D variable is not significant (F < 1). 
LA. stress Ss compared to LA nonstress Ss 
tend nonsignificantly to be impaired less pro- 
ceeding from E to D words in terms of where 
they respond with disturbed associations (¢ = 
1.40, 18 df, p = .09). 

When the content of the response word is 

8 Wiener (1954, Appendix D) provides a rationale 


for utilizing analysis of variance statistics for pro- 
jective tasks. 


Table 3 
Mean Difference (D)—E) Scores of Percentage of 
Word Association Disturbance 


Stress Nonstress 


D—E Disturbance 
Difference 


considered, HA stress Ss respond with more 
disturbed associations on D words, whereas 
LA stress Ss tend not to respond differentially 
to E and D words. 


Discussion 

Responses to diagnostic projective tests may 
be in part a function of how individuals with 
different personality characteristics respond to 
variations of conflicting cues or difficulty lev- 
els within problem-solving situations. Test 
anxious Ss experience anxiety specifically in 
a test situation—a situation in which achieve- 
ment motives are elicited. This anxiety prob- 
ably arises in part as a response to a previ- 
ously learned conflict which is induced by the 
elicited achievement motivation. Therefore, it 
is assumed that in the test situation, stress 
instructions given to HA Ss evoke an incre- 
ment in the degree of anxiety these Ss experi- 
ence. HA Ss, then, react to achievement-re- 
lated conflicts with heightened anxiety, which 
leads to impairing anxiety-reducing responses. 

It is postulated that HA Ss are individuals 
who react to all test-related conflicts with im- 
pairing anxiety, whether the conflict is inter- 
nal or external to the HA individual. Since 
HA stress Ss react differently to E and D 
items, it is suggested that there is an inter- 
action between Ss’ internal conflicts and the 
external conflict situation: HA stress Ss ap- 
proach a conflict situation with anxiety and 
therefore cannot effectively resolve the prob- 
lems presented by a conflict situation. Thus, 
conflict situations engender anxiety in HA Ss 
so as to impair performances, and perform- 
ances in conflicting situations are also im- 
paired as a function of Ss’ anxiety level. 
Child’s interpretation seems very much simi- 
lar to this: “. . . in complex situations, where 
the subject is already in conflict between vari- 
ous response tendencies relevant to the task, 
the presence of irrelevant responses made to 
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anxiety heightens the conflict and interferes 
with performance . . .” (1954, p. 154). 

It is assumed that LA stress Ss compared 
to LA nonstress Ss have more achievement 
motivation and that both groups are rela- 
tively lacking in anxiety-produced interfering 
responses to test situations. Increased moti- 
vation, then, appears to be associated with 
less impairment as a function of response 
competition. Possibly, low motivation is as- 
sociated with a task attitude of not attending 
to the more difficult aspects of the problem, 
whereas heightened motivation causes Ss to 
attend to E and D items equally well. 

The results suggest that HA stress Ss and 
_LA nonstress Ss tend to manifest similar overt 
behavior. The tendency for these two groups 
to behave similarly is associated with differ- 
ent dynamic antecedents. HA stress Ss are 
impeded by their anxiety from dealing effec- 
tively with the external problem situation. It 
appears that LA nonstress Ss, because of their 
relative lack of involvement, are not moti- 
vated to deal effectively with an external 
problem situation. 

It is possible that HA stress Ss may have 
experienced increased motivation to do well 
along with impairing anxiety-reducing re- 
sponses. If this were so, the facilitating effect 
of increased motivation might partially ob- 
scure the impairing effect produced by inter- 
fering responses. 


Summary 


This study was designed to test the effects 
of anxiety, stress instructions, and difficulty 
level on a motor learning task and to general- 
ize the effects of these variables to a Ror- 
schach-like perceptual task and a word asso- 
ciation task. 

Four groups of 10 Ss each were used: a 
high anxious (HA) and a low test anxious 
(LA) group under stress instructions; HA 
and LA groups under nonstressful instruc- 
tions. Each of these four groups was given a 
motor learning, word association, and per- 
ceptual task in the order listed. All experi- 
mental materials contained simple (E) and 
difficult (D) aspects. The D aspects con- 
tained a relatively greater amount of com- 
peting cues. 

HA stress Ss compared to LA stress Ss 
were more impaired going from E and D 
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items within the tasks. This finding was ob- 
tained for the motor learning, perception, and 
word association tasks. It is assumed that HA 
individuals under stress instructions experi- 
ence interfering anxiety-reducing responses 
and these interfering responses interact with 
conflicting environmental cues to cause im- 
paired performance. 

Stress instructions appear to affect LA Ss 
so that LA stress Ss compared to LA non- 
stress Ss tend not to be impaired going from 
E to D items. LA stress Ss compared to LA 
nonstress Ss make fewer errors on the D as- 
pect of the motor learning task and tend not 
to perform differently to E and D words of 
the word association task in responding with 
disturbed associations. 

This experiment’s results suggest that the 
response competition definition of difficulty 
can be applied to projective materials. Pro- 
jective tasks, in part, may be viewed as prob- 
lem-solving situations. 
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RELATIONSHIPS AMONG DIRECT AND INDIRECT 
MEASURES OF THE ACHIEVEMENT MOTIVE 
AND OVERT BEHAVIOR 
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Ohio State University 


Investigators planning studies which include 
n Achievement as one of the relevant vari- 
ables can choose among several tests which 
purport to measure n Achievement. One of the 
most commonly used measures of n Achieve- 
ment is McClelland, Atkinson, and Lowell’s 
TAT-fantasy method (1953). The Personal 
Preference Schedule (PPS) recently published 
by Edwards (1954) also yields a score for n 
Achievement, a score which is more easily 
obtained in comparison to the fantasy method. 
Despite the great dissimilarity of these two 
methods, both have been used for the same 
purpose (predicting some behavior supposedly 
related to the strength of n Achievement). It 
seems necessary that some comparisons be 
made to determine whether these tests do in 
fact measure the same thing in the same way. 
McClelland (1958) has cited an unpublished 
study by Birney which reported a lack of any 
relationship (r = — .002, N = 300) between 
the TAT-fantasy and PPS measures of n 
Achievement. Additional data as to their util- 
ity for predicting overt achievement-oriented 
behavior would also be of some importance. 
In the absence of such information it seems 
inadvisable to make generalizations concern- 
ing achievement behavior when the data be- 
ing compared were obtained by the different 
methods. 

The purpose of the present study was to de- 
termine what relationships exist among three 
measures of achievement behavior. The three 
tests used were the TAT-fantasy method, 
the n Achievement scale of the PPS, and a 
sociometric measure. Previous studies have 
typically investigated the relationship of n 
Achievement scores to specific task perform- 
ances in experimental settings. The socio- 
metric measure which served as the criterion 


variable in the present study would seem to 
have the advantage of greater generality in 
terms of real-life behavior. 

Rotter’s Social Learning Theory (SLT) 
(1954, 1955) states that to maximize pre- 
diction from test responses it is necessary to 
differentiate among, and then evaluate sys- 
tematically, the internal characteristics of the 
individual (his goal preferences and expect- 
ancies). The psychological situation must be 
treated as a third determinant of goal-di- 
rected behavior. Several studies dealing with 
achievement motivation offer empirical sup- 
port for this conceptual scheme. French 
(1955), and in particular Atkinson (1953) 
and others (Atkinson & Raphelson, 1956; 
Atkinson & Reitman, 1956) have emphasized 
the need to take account of the achievement 
expectancies aroused by situational cues in 
order to maximize the prediction of overt 
achievement-oriented behavior from a meas- 
ure of the strength of internal motivation. 

Neither the fantasy measure nor the PPS 
makes any systematic use of the concepts of 
expectancy, goal preference, or the psycho- 
logical situation. It seemed likely, therefore, 
that the fantasy and PPS measures of n 
Achievement would yield low correlations with 
a criterion measure of overt achievement be- 
havior. In addition, SLT would lead one to 
expect that the fantasy and PPS measures 
would have a low correlation with each other. 
This latter assumption is based on the belief 
that the cues provided by a relatively un- 
structured fantasy situation (indirect meas- 
ure) will lead the subjects (Ss) to develop 
expectancies for satisfactions that differ in 
kind from those developed in a highly struc- 
tured “self-report” situation (direct measure). 
Should an individual's expectancies in regard 
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to two psychological tests differ greatly, then 
the scores obtained on the tests are not likely 
to be highly congruent. 


Method 
Subjects 


Forty-four male undergraduate students 
comprising the entire membership of a social 
fraternity at Ohio State University partici- 
pated in the study. Each S$ was a member of 
the fraternity for at least six months and a 
student at the University for at least one 
year. The cooperation of the Ss was obtained 
by paying the fraternity for the approximately 
55 minutes necessary for testing. 


Procedure 


Each S wrote six four-minute imaginative 
(TAT-type) stories under neutral group con- 
ditions according to the standard procedure 
of McClelland et al. (1953). Cards 2, 8, 1, 7, 
28, and 4 of the standard series were adminis- 
tered in that order. The fantasy n Achieve- 
ment scores were obtained by scoring the 
stories for Achievement Imagery and Achieve- 
ment Thema according to the definitions of 
McClelland et al. (1953). The possible range 
of scores was 0 to 12. A mean of 5.5 and a 
standard deviation of 2.51 were obtained for 
the fantasy test. A measure of the interrater 
reliability for the fantasy test was obtained 
by correlating (Pearson r) the scores ob- 
tained by the writer with those independ- 
ently obtained by another scorer. The result- 
ing correlation, based on the stories of 20 Ss, 
was .96. 

Following the completion of the stories, the 
Ss were given 55 pairs of items from the PPS, 
28 of which comprised the n Achievement 
scale of the PPS. The remaining 27 items 
were used as buffers. The mean score for the 
n Achievement scale of the PPS was 12.4, 
with a standard deviation of 3.41. The pres- 
ent mean of 12.4 is significantly lower (p < 
.001) than the mean reported by Edwards 
(15.6) for the males in his normative group 
of college students. This difference can be at- 
tributed to the highly selected nature of the 
present sample, which consisted entirely of 
fraternity members who were all enrolled in 
the College of Agriculture. The standard de- 
viation of 3.41 does not differ significantly 
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from the standard deviation of 4.13 reported 
by Edwards. 

The Ss were then administered a socio- 
metric measure of achievement behavior which 
was adapted from Fitzgerald’s (1958) meas- 
ure of overt dependency.’ This instrument, 
which is based on a nominating technique, 
served as the criterion measure and yielded 
a quantified index of overt achievement be- 
havior. Two descriptions were devised for the 
present study to describe the extremes of 
achievement behavior. The statements de- 
scribing behaviors considered most repre- 
sentative of overt achievement-striving were: 


Description A 


Studies long and hard before exams 

Regardless of the activity strives to be best 

Likes to demonstrate his abilities and make a good 
impression 

Sets very high standards for himself which he 
strives to attain 


The statements describing behaviors consid- 
ered least representative of overt achievement- 
striving were: 


Description B 


Finds reasons not to study before exams 

Doesn’t seem concerned whether he comes out first 
or last in competitive activities 

Seldom gets involved in tasks requiring great skill 
or effort 

Seldom strives to improve his performance 


Each S was provided a copy of the descrip- 
tions which were labeled Description A and 
Description B, along with a list of the names 
of all the fraternity members. Ss were then 
asked to select from the list of members “the 
name of the man who, in general, best fits 
Description A.” The Ss then selected the name 
of the man who “best fits Description B.”’ 
Following this selection, the Ss chose the man 
who “next best fits Description A,” and then 
the man who “next best fits Description B.” 
The nominations were thus made alternately, 
and in this manner each S made three nomi- 
nations for each description. The nominations 
each S received on Description A were scored 
+3 for “best fits,’ +2 for “next best fits,” 
and +1 for “third best fits.” Similarly, the 
nominations for Description B were scored 
—3, —2, and —1. An S’s total score was the 
algebraic sum of all the nominations he re- 
ceived. A constant was added to each S’s to- 
tal score to convert all the scores to positive 
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Table 1 
Intercorrelations Among the Measures of 
Need Achievement and Overt 
Achievement Behavior 


Sociometric 
.33* 
09 


*p < .05 level. 


values. The sociometric ratings yielded a cor- 
rected split-half reliability of .94. As in the 
Fitzgerald study, the method of nominating 
only the extremes of the group provided a 
normal distribution of scores. 


Results 


An analysis of the relationship between the 
fantasy and PPS measures of n Achievement 
and overt achievement behavior was made by 
correlating (Pearson r) the sociometric rat- 
ings with the scores on the fantasy and PPS 
tests. It is evident from Table 1 that there is 
no relationship between the PPS measure of 
the strength of n Achievement and the socio- 
metric index of overt achievement behavior. 
There is a significant relationship (r = .33, 
p< .05) between the fantasy measure and 
the sociometric ratings. 

The amount of relationship between the 
fantasy and PPS measures of n Achievement 
was evaluated by correlating (Pearson r) the 
scores on the fantasy and PPS tests. Table P 
indicates that the data obtained from these 
instruments are dissimilar and that the fan- 
tasy and PPS tests are not equivalent meas- 
ures of n Achievement. 


Discussion 


According to SLT, the degree of relation- 
ship between a measure of need strength and 
overt behavior is typically of an unimpressive 
order because of the failure to take into con- 
sideration the view that any given situation 
has within it possibilities for the satisfaction 
of many different needs. Which particular 
need an individual will seek to satisfy will 
depend on the value of that satisfaction for 
him, and his expectancy as to whether his be- 
havior will lead to that satisfaction in a par- 
ticular situation. The absence of any relation- 


ship between the PPS measure of n Achieve- 
ment and the sociometric ratings is therefore 
not unexpected in view of Edwards’ failure to 
systematically distinguish between what an 
individual says he would like to do and what 
he says he has done in the past. It appears 
that the PPS is a confounded measure of n 
Achievement because of the failure to con- 
sistently word items in the same manner and 
differentiate between statements of expectancy 
and statements of goal preference. It can be 
concluded that the PPS is not a valid pre- 
dictor of overt achievement behavior in a col- 
lege population when peer ratings are used as 
the criterion measure. 

The TAT-fantasy measure is significantly 
related to the sociometric ratings and appears 
to be a more valid predictor of overt achieve- 
ment behavior than the PPS. It should be 
noted that reliabilities between .54 and .78 
have been reported for a six-picture TAT 
measure of n Achievement (Haber & Alpert, 
1958). This suggests that a more substantial, 
but still only moderate, relationship between 
the fantasy measure and overt behavior might 
have been obtained with a more reliable form 
of the fantasy test. For purposes of predic- 
tion, however, the obtained fallible scores 
yield a correlation of .33, which would result 
in only a 5.6% reduction in errors of predic- 
tion. The results of this and earlier studies 
(Atkinson, 1953; Atkinson & Raphelson, 1956; 
Atkinson & Reitman, 1956; French, 1955) 
clearly indicate the necessity of considering 
overt achievement-striving a complex, multi- 
ply determined behavior and suggest, further, 
that the failure to do so is likely to result, at 
best, in only low to moderate correlations be- 
tween a measure of motive strength and some 
behavioral criterion. It is important to em- 
phasize that the fantasy scores of n Achieve- 
ment in the present study were based on a 
theme count. When “clinically” interpreted, 
the fantasy method might be a more valid 
predictor of achievement or other kinds of 
behavior. 

The absence of any relationship between 
the fantasy and PPS measures indicates that 
these tests are not equivalent measures of n 
Achievement and that inferences from one 
test cannot be generalized to the other waen 
n Achievement is the relevant variable. Thus, 
the present data support the earlier finding 


(N = 44) 
PPS 
TAT —.05 
PPS 
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of Birney, and that of de Charms et al. 
(1955), who also found that Ss perform in 
dissimilar manners on two tests which sup- 
posedly measure the same thing (n Achieve- 
ment) when one test is an indirect measure 
of the motive and the other a direct measure. 
’ A possible explanation for these findings, on 
the basis of SLT, is that the difference in test 
performance may be mediated via the differ- 
ent satisfactions that the Ss expect to obtain 
in the two situations. The fantasy situation 
may provide cues that lead to expectancies 
for satisfactions related to being creative and 
imaginative. In contrast, the self-report situa- 
tion may lead to the development of expect- 
ancies for satisfactions related to being self- 
revealing or introspective. 


Summary 


The purpose of the present study was to de- 
termine the degree of relationship between 
two measures of n Achievement (the TAT- 
fantasy method and the n Achievement scale 
of the PPS) and a criterion measure of overt 
achievement behavior; and to determine the 
degree to which the two n Achievement meas- 
ures yield equivalent scores. On the basis of 
Rotter’s Social Learning Theory, it seemed 
likely that the two measures of n Achievement 
would yield low correlations with a socio- 
metric index of overt achievement behavior. 
It seemed, further, that the two measures of 
n Achievement would have a low correlation 
with each other. 

Forty-four male undergraduate subjects 
who were members of the same social fra- 
ternity were administered the TAT-fantasy 
and PPS tests of n Achievement, followed 
by the sociometric measure which was based 
on a nominating technique. The relationships 
among these instruments were analyzed by 
computing the product-moment correlations 
between the three measures. 

The obtained correlations indicate that 
there is no relationship between the PPS and 
the sociometric measure, and no relationship 
between the PPS and the TAT-fantasy test. 
A significant correlation (r = .33) was ob- 
tained between the TAT-fantasy test of n 
Achievement and the sociometric measure. 
It was concluded that for a college popula- 
tion the fantasy test appears to be more suit- 
able than the PPS for the prediction of overt 
achievement behavior. 
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The results were interpreted on the basis of 
Rotter’s contention that additional variables 
are necessary if greater predictive utility is 
to be obtained with tests that measure the 
strength of internal motivation. The lack of 
relationship between the TAT-fantasy test 
and the PPS was interpreted as being due to 
the different kinds of expectancies which the 
Ss develop in the fantasy and PPS situations. 
The correspondence between the theoretical 
interpretation of the present data and previ- 
ous research findings was noted. 
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AN APPRAISAL OF TAULBEE AND SISSON’S 
“CONFIGURATIONAL ANALYSIS OF MMPI 
PROFILES OF PSYCHIATRIC GROUPS” 


SOL L. GARFIELD anp JON SINEPS 


University of Nebraska College of Medicine } 


In a recent article in this journal, Taulbee 
and Sisson (1957) presented a technique of 
objective configurational analysis applied to 
the MMPI profiles of schizophrenic and neu- 
rotic patients. Sixteen scale pairs were uti- 
lized, which differentiated significantly two 
criteria and three validation groups of psy- 
chiatric patients. According to the authors, 
“The findings of this study indicate that the 
application of a configurational analysis tech- 
nique for analyzing MMPI data yields an 
effective means for differentiating groups of 
neurotic and schizophrenic patients” (Taul- 
bee & Sisson, 1957, p. 416). 

In their study, Taulbee and Sisson pre- 
sented specific cutoff scores for differentiating 
their schizophrenic and neurotic patients. Ac- 
cording to their scheme, a scale pair score of 
13 or more is a likely indicator of neurosis, 
while a score of 6 or less is considered indica- 
tive of schizophrenia. Any score between these 
two values is considered as “indeterminate,” 
or nondiagnostic for these specific categories 
of patients. They also mentioned that “the 
lack of a normal sample and other psychiatric 
groups limits the application of these find- 
ings” (1957, p. 417). However, they felt that 
where there is a problem of differentiating 
between neurotic and schizophrenic condi- 
tions, “the cautious application of these scale 
pairs may be quite useful” (1957, p. 417). 
Some mention was made also that subsequent 
research indicated that latent schizophrenics, 
neurological cases, and normals frequently 
score in the “indeterminate” range between 
7 and 12. 


1 Nebraska Psychiatric Institute. 


The Present Study 


Since, in the clinical situation, most re- 
ferrals are for a general personality and diag- 
nostic appraisal rather than to differentiate 
between two specific categories such as neu- 
rosis and schizophrenia, the pragmatic test 
of the diagnostic value of any technique is to 
see how successive referrals are classified or 
diagnosed. In most instances the diagnostic 
possibilities for a given patient include other 
labels than just schizophrenia or neurosis. An 
effective technique for general clinical practice 
must differentiate not only between two se- 
lected groups of patients, but should be able 
to differentiate these groups from other diag- 
nostic groups. Frequently a diagnostic tech- 
nique which differentiates one pathological 
group from a control group fails when it is 
applied to the range of psychopathology en- 
countered in the clinic or hospital. Conse- 
quently, we were interested in seeing how this 
particular approach works with a regular run 
of diagnostic referrals. 

From our files, 129 case records were se- 
lected. These cases represented all cases of a 
nonorganic nature given the MMPI as part 
of their diagnostic test battery over the past 
two years and for whom there was a specific 
final diagnosis from the psychiatric staff. This 
group of patients was a heterogeneous group 
with diverse diagnoses. Sixteen were diagnosed 
as schizophrenic, 35 as neurotic, and the 
others were distributed among personality 
disorders, antisocial reaction, other psychotic 
conditions, etc. The mean age for the total 
group was 32.7 years (SD 13.4), with the 
means for the various subgroups being close 
to this figure. For example, the mean age for 
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Table 1 


MMPI Classification of Cases Diagnosed as Schizophrenia and Psychoneurosis 


Patient 
Diagnoses 


Schizophrenia 


Psychoneurosis 


Total 


the schizophrenic group was 31.1 (SD 12.8), 
while the mean for the neurotic group was 
32.1 years (SD 9.9). The mean educational 
level of the total group was 11.1 grades in 
school (SD 2.3), and it was practically the 
same for all subgroups. The group consisted 
of 74 males and 55 females.” Since Taulbee 
and Sisson utilized only male subjects, sepa- 
rate analyses were made for each sex. 

A configurational analysis for each case 
was made, following the stated criteria of 
Taulbee and Sisson. The cases were sepa- 
rated into two groups. All those patients for 
whom there was a final diagnosis of either 


2 Comparisons were also made of the mean age 
and education for each sex in the various groups and 
did not reveal any unusual differences. 


MMPI Classification 


Correct Incorrect Indeterminate 


6 1 2 
4 0 3 
5 
4 


17 10 


psychoneurosis or schizophrenia were analyzed 
separately. These findings are presented in 
Table 1. The data on the other cases were 
also analyzed, and the results are given in 
Table 2. 

As can be seen from Table 1, a majority 
of the patients diagnosed as schizophrenia are 
correctly classified by the MMPI scales. Only 
one case out of the 16 is misclassified, while 
five are categorized as “indeterminate.” How- 
ever, a different pattern of results was ob- 
tained for those patients diagnosed as psy- 
choneurotic. In fact, in the latter group only 
seven patients out of 35 were correctly clas- 
sified, the remainder being misclassified or in 
the indeterminate category. In terms of the 
data presented in Table 1, 33° of these cases 


Table 2 


MMPI Classification of Cases with Diagnoses Other Than Psychoneurosis and Schizophrenia 


Patient Diagnoses 


Antisocial reaction 
Sociopath 
Personality disorder 
Passive-Aggressive ; Emot. unstable 
Other psychotic conditions 


Psychophysiological reactions 


Other 
Adjust. Diffic.; Alcoholism 


Total 


MMPI Classification 


Indeter- 
minate 


| 
| 
| 
| 


334 
F 7 7 
| 
M 12 af 
F 23 
Sex Schiz. PN || 
M 14 1 4 
F 1 0 1 
M 10 2 5 
F 4 2 7 
M 0 2 3 ug 
F 3 1 1 
M 1 0 
F 1 1, 1 
M 5 0 5 
F 1 0 1 
= 40 10 28 


Appraisal of Taulbee and Sisson’s Analysis of MMPI 


would be correctly diagnosed, almost 20% 
would be incorrectly classified, and the re- 
mainder would fall in the indeterminate cate- 
gory. On the basis of these two diagnostic 
groups alone, the profile analysis used here 
would appear to have some limitations as an 
aid in psychiatric diagnosis. Certainly, the 
current results are much less favorable than 
those reported by Taulbee and Sisson. 

As mentioned earlier, however, when cases 
are referred for psychological evaluation, the 
diagnostic question is not usually limited to 
schizophrenia or psychoneurosis. Frequently, 
many possibilities are entertained, and a more 
practical clinical test of the effectiveness of 
this configurational approach is to see how it 
works with other clinical groups. 

The data secured with other types of pa- 
tients for whom the diagnostic picture is not 
always clear are presented in Table 2. The 
most interesting finding is that slightly over 
one-half of these 78 patients would be classi- 
fied as schizophrenic in terms of the Taulbee- 
Sisson configurational method. This is par- 
ticularly true of patients diagnosed as anti- 
social or sociopathic. A somewhat similar find- 
ing is noted with those patients falling in the 
broad category of personality disorders made 
up predominantly of individuals with passive- 
aggressive patterns and emotional instability. 
In fact, an analysis of Table 2 indicates that 
a schizophrenic diagnosis is much more fre- 
quent from the patient groups represented 
here than is any other. Certainly the psycho- 
neurotic pattern seems to be much more in- 
frequent. In our total group of patients, only 
18 received a clear-cut psychoneurotic score, 
whereas 59 were classified as schizophrenic. 
On the basis of these data, one might antici- 
pate securing “schizophrenic” scores quite fre- 
quently with a broad variety of patient groups. 


Discussion 


On the basis of the data collected in this 
study, it appears that the profile method of 
Taulbee and Sisson may be less valuable 
in differentiating between schizophrenics and 
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neurotics than previously reported. It also 
does not appear to offer much promise as a 
diagnostic measure when applied to the wide 
range of cases seen in clinical practice. With 
the exception of the small group of schizo- 
phrenics in this study, the psychoneurotic 
and other patient groups provide many false 
positives as well as a large number of inde- 
terminate diagnoses. The false positives seem 
largely to occur with the scales designated as 
indicating a schizophrenic diagnosis. It would 
thus appear that many more incorrect and in- 
determinate diagnoses would be secured than 
would accurate ones. 

Obviously, the same problems of compa- 
rability of sample populations and the reli- 
ability of psychiatric diagnosis apply to the 
current researches as to other studies which 
are based on groups differentiated in terms of 
such diagnostic criteria (Garfield, 1949). If 
such difficulties contribute to the problems of 
replicating the work of other investigators, as 
seems reasonable, it is apparent that the test 
patterns secured in one setting can only be 
applied with extreme caution to other settings. 


Summary 


The configurational analysis of MMPI pro- 
files devised by Taulbee and Sisson was ap- 
plied to a group of 129 patients in different 
diagnostic categories. In terms of the data 
obtained, it appears that this method offers 
little promise as a diagnostic measure when 
applied to unselected samples of patients in 
clinical practice. The perennial problems of 
the comparability of patient samples and the 
reliability of psychiatric diagnoses also are 
pertinent here. 


Received June 16, 1958. 
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THE EFFECTS OF WARM AND COLD INTERACTION 
ON THE ADMINISTRATION AND SCORING OF 
AN INTELLIGENCE TEST’ 


JOSEPH MASLING 


Syracuse University 


Many recent studies have examined the in- 
fluence of the psychologist-subject relationship 
in projective testing. Despite the methodo- 
logical errors (Hammond, 1954; Levy, 1956) 
which have characterized some of this re- 
search, there is evidence that subjects do not 
give the same responses to one examiner that 
they give to another, because of the instruc- 
tions (Henry & Rotter, 1956), the reinforce- 
ment given the responses (Wickes, 1956), the 
situation (Klatskin, 1952), and the person- 
ality of the examiner (Lord, 1950; Sanders & 
Cleveland, 1953). Not only does the examiner 
influence the subject’s test behavior, but it 
has been demonstrated (Masling, 1956) that 
a subject by acting “warm” or “cold” can 
influence an examiner’s interpretation of re- 
sponses to a projective test. 

The ambiguity which faces both the ex- 
aminer and the subject in a projective testing 
situation makes it probable that each will be 
influenced by the other in attempting to com- 
plete their respective tasks. In intelligence 
testing, however, the instructions are specific, 
the stimuli are clearly defined, and there are 
right and wrong answers. The examiner who 
administers an intelligence test is required to 
read the questions as stated in the text and 
to evaluate the answers with the aid of the 
scoring manual. 


1 The author is greatly indebted to James Beaber, 
H. O. Beldin, Dan Briggs, Robert Gravely, John 
Henderson, Randall Martin, John McKinney, Louis 
Patulo, Eugene Quarrick, Donald Smith, and Gerald 
Studebaker who so willingly served as examiners; to 
Neila Dunay and Eileen Shapiro who performed com- 
petently as accomplices; and to Bernard Braen, San- 
ford Dean, Howard Friedman, Sidney Orgel, Bertram 
Rothschild, Jerome Schiller, Edward Siegel, and Her- 
bert Tothill who served as judges in various parts of 
the study. 


The examiner-subject relationship in intelli- 
gence testing has not received a great deal of 
attention. During the course of their training 
most examiners are exhorted to establish “rap- 
port” and admonished to be “objective.” The 
“objective” examiner is charged with the re- 
sponsibility of deriving as valid an estimate 
of the intelligence of the subject as can be 
obtained, without regard for his personal atti- 
tudes about the subject. He is thus expected 
to be standardized and depersonalized. The 
purpose of the present study was to investi- 
gate the extent to which an examiner could 
divest himself of personal bias in administer- 
ing and scoring an intelligence test—in this 
case, three verbal subtests of the Wechsler- 
Bellevue II (hereafter called the W-B II)— 
when the subject acted in either a highly 
approving, interested (warm) manner or in 
a persistently rejecting, disinterested (cold) 
manner. The specific hypotheses which were 
tested were as follows: 

I. When an examiner tests two subjects, 
one of whom acts warm to him and the other 
cold, he will be more generous in scoring the 
responses of the warm subject. 

II. During the course of administration of 
an intelligence test to two subjects, one of 
whom is warm and the other cold, an ex- 
aminer will: (a) make more reinforcing state- 
ments to the warm subject than the cold; (0) 
ask more questions of the warm subject, giv- 
ing him the opportunity of clarifying or re- 
formulating an answer. 


Method 


The interaction. Manipulation of the inter- 
action was effected through the use of attrac- 
tive female accomplices, posing as test sub- 
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jects (Ss), who acted either warm or cold to 
the examiner. In the warm condition, the ac- 
complice acted interested in the examiner and 
in the test; she responded freely to his ques- 
tions and tried to communicate respect and 
liking for him. In the cold condition, the ac- 
complice acted disinterested and bored with 
the test and the examiner; her attitude was 
that of fulfilling an unpleasant class as- 
signment which she wanted to complete as 
soon as possible. She tended to answer pre- 
test interview questions in monosyllables and 
throughout avoided eye contact with the ex- 
aminer. In the middle of each cold session the 
accomplice in a deliberate, calculated fashion 
put on sunglasses, thereby increasing the psy- 
chological distance between herself and the 
examiner. 

Examiners. The examiners were 11 gradu- 
ate students at Syracuse University, four in 
the area of clinical psychology, three each in 
special education and remedial reading, and 
one in the area of developmental psychology. 
Each examiner had completed at least one 
course in the administration of individual tests 
of intelligence and, in addition, six were cur- 
rently enrolled in or had completed at least 
one course in projective testing. The esti- 
mated number of W-Bs each had given previ- 
ous to the study ranged from 18 to over 200, 
with a median of 21. 

When each examiner was asked to partici- 
pate in the study he was told that the author 
was interested in the comparability of various 
short forms of the W-B. He was informed 
that he would administer either two or three 
subtests and that the choice of particular sub- 
tests for each examiner would be made on a 
random basis. The Ss would be two under- 
graduates who were completing their assign- 
ment of participating in psychological experi- 
ments as part of the requirement of the intro- 
ductory course in psychology. The examiners 
were asked not to communicate with each 
other regarding any phase of the study, and 
they were requested to urge the Ss not to dis- 
cuss the study with their friends. 

Testing session. Each examiner adminis- 
tered three subtests—Information, Compre- 
hension, and Similarities—to both Ss, one of 
whom acted warm to him and the other cold. 
Each accomplice had five cold and five warm 
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roles, with the order of presentation of con- 
ditions and accomplice counterbalanced. 

Scoring. To insure uniformity of scoring 
procedures the examiners were requested to 
use as a guide to scoring only the instructions 
found in Wechsler’s (1944) “Measurement of 
Adult Intelligence,” third edition. They were 
asked not to use any supplementary manuals 
or monographs. 

Test responses. The experimenter prepared 
a script for each accomplice to memorize and 
repeat to each examiner, regardless of the in- 
teraction condition she was to establish with 
him. For each accomplice, 14 of these re- 
sponses were written to maximize difficulty in 
scoring. As examples of the answers that were 
used, Accomplice A’s script called for her to 
make these responses: 


Information subtest: 

Item, population: “180 million” 

Item, Faust: “It is an opera. I’ve listened to it, 
heard it. I’m not sure of the composer’s name. It is 
French, or maybe German, something like G-o-n-o-t, 
Gonot. I really don’t know.” (She pronounced the 
name G6-no.) 

Item, Habeas Corpus: “Like a writ of Habeas 
Corpus? That gets you out of jail, when you're in, 
sometimes.” 

Similarities subtest: 

Item, praise and punishment: “They are alike in 
that they are both ways of criticizing other people 
in order to improve their behavior.” 


Accomplice B’s script contained these re- 
sponses: 

Information subtest: 

Item, population: “120 million” 

Item, Koran: “That is the bible of the Arab peo- 
ples.” 

Comprehension subtest: 

Item, laws: “Laws establish a set of rules so that 
everyone knows what is expected of him.” 

Item, land: “Because it is more desirable.” 


Each accomplice was warned of the necessity 
of adhering rigorously to the script assigned 
her without making any substitutions or im- 
provisations. If asked for additional informa- 
tion they were told to repeat the substance of 
the original answer, unless the script provided 
for a clarification of the response. An analysis 
of the typescripts showed that the accom- 
plices did, in fact, repeat the assigned an- 
swers for each of the examiners, with but a 
few minor deviations from the script. 

The testing room was equipped with a tape 
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recorder and each examiner was asked to re- 
cord both sessions. Examiner 9, however, had 
the input volume turned so low that it was 
impossible to obtain a typescript from either 
of his tapes, and Examiner 10 was unable to 
work the recorder for his first session. For this 
reason Hypotheses Ila and IIb were tested 
using the typescripts of the remaining eight 
examiners. Since all examiners turned in 
scored test forms, Hypothesis I was tested 
with an N of 10. 


Results 


There was no mistaking the impact of the 
warm and cold conditions on the examiners. 
All reported that one S seemed particularly 
disinterested in the test, with some emphasiz- 
ing the notion that this represented “sick” be- 
havior. One examiner correctly guessed that 
the Ss were really accomplices, and he was 
therefore replaced and his data were not used. 
Krasner (1958) has reported that in studies 
attempting to establish conditioning of ver- 
bal behavior without awareness approximately 
5% of the Ss learn what the experimenter is 
trying to do. It is generally accepted practice 
not to use the data on these Ss. 

Hypothesis I was tested by comparing the 
way in which each examiner scored the re- 
sponses given him under the two conditions. 
Since the experimenter had intentionally writ- 
ten responses that gave a higher “true” IQ 
for one of the accomplices than for the other, 
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the bias of examiner scoring was determined 
from the mean of the 10 scores given each 
accomplice, rather than from the raw scores. 
Once a mean score for each accomplice had 
been obtained, the extent and direction of 
differences from each mean were derived for 
each examiner. To test for the probability of 
obtaining a distribution of differences such as 
this by chance, the Walsh test, as suggested 
by Siegel (1956), was used, yielding a one- 
tailed p value of .056. 

Hypothesis I can also be evaluated by in- 
spection of Table 1. Of the five examiners who 
tested Accomplice A under the warm condi- 
tion, four gave her scores greater than the 
mean, while of the five examiners who tested 
her in the cold condition, four gave her scores 
smaller than the mean. The identical results 
were obtained for Accomplice B: four of the 
five examiners who interacted with her in a 
warm manner gave her scores greater than the 
mean, while four of the five examiners who 
interacted with her in the cold condition gave 
her scores smaller than the mean. 

In order to test Hypotheses IT, typescripts 
were made from the tapes of each interview. 
The eight pairs of tapes that were audible 
were transcribed and coded, with identifica- 
tion of the name of examiner, S, and type of 
interaction removed. Two judges were given 
typescripts that began with the first question 
on the Information subtest, the pre- and post- 
test interviews having been removed, and they 


Table 1 


Raw Scores (and Deviations from the Mean) Given Each Accomplice by Examiner 
and Interaction Condition 


W-B 


Previously 


Accomplice A* 


Examiner Given Warm 


(200+-) 47 (+2.7) 
(40) 
(50) 
(20) 
(11) 
(20) 
(22) 
(20) 
(100+) 
(18) 


45 (+0.7) 


41 (—3.3) 
46 (+1.7) 


45 (40.7) 


Ce ONE 


* Mean score = 44.3. 
»b Mean score = 48.5. 


Cold 


48 (—0.5) 

43 (—1.3) 47 (—1.5) 
45 (—3.5) 

47 (+2.7) 49 (+0.5) 
47 (—1.5) 
49 (+0.5) 
43 (—1.3) 
43 (—1.3) 


49 (+0.5) 
52 (+3.5) 
48 (—0.5) 


43 (—1.3) 51 (+2.5) 
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Table 2 


Examiner Remarks by Interaction Condition and Accomplice 


Accomplice A 


Accomplice B 


Previously 


Examiner Given 


(200+) 
(40) 
(50) 
(20) 
(11) 
(20) 
(22) 
(20) 


* Reinforcing remarks. 
> Questioning remarks. 


assigned every examiner remark to one of four 
categories: reinforcing statement, punishing 
statement, questioning statement, and miscel- 
laneous. A reinforcing statement was defined 
as any remark designed to encourage, support, 
or reassure the S or the response he made 
(“OK,” “swell,” “I knew you could answer 
that,” etc.). A punishing statement was de- 
fined as any remark designed to insult or dis- 
parage the S or the response he made (“I 
thought everyone knew that answer,” etc.). 
A question was defined as any attempt made 
to have the S give additional information or 
to clarify an answer (“Tell me more about 
that,” “Can you be more specific,” etc.). The 
miscellaneous category was used for all other 
examiner remarks, including such phrases as 
“Uh-huh” and “Mm-hmm.” 

There were 285 examiner remarks in the 
testing sections of the 16 interviews. The 
judges independently agreed on the rating of 
254 (89%) of them, the categorization of the 
31 remaining remarks taking place by confer- 
ence. The distribution of reinforcing and ques- 
tioning remarks by examiner, accomplice, and 
cqndition is found in Table 2. Since the judges 
agreed that only two punishing remarks were 
made, no further work was done with this 
category nor was there any analysis of the 
wastebasket miscellaneous category. 

Once the number of reinforcing and ques- 
tioning statements had been obtained, a com- 
parison was made of each examiner's verbal 
behavior during the warm interaction with his 


Cold Warm 


Cold 


R* Q» R* Q» 
5 


8 


verbal behavior during the cold interaction by 
use of the Wilcoxon matched-pairs signed- 
ranks test (Siegel, 1956). Using one-tailed 
probability values for all tests of significance, 
it was found that examiners made more rein- 
forcing statements to the warm Ss (p value 
between .01 and .025), and they asked more 
questions of the warm Ss (p = .025). The 
sum of reinforcing and questioning statements 
was also greater for the warm condition than 
the cold (p between .01 and .025). 

While the experimental hypotheses were 
substantiated, the differences between condi- 
tions seemed much smaller than the differ- 
ences among examiners. For example, Ex- 
aminer 1 made a total of only 14 remarks to 
his Ss, while Examiner 7 made 64. Examiner 
8’s scoring favored the warm condition by 4.8 
points, while Examiner 5's scoring was biased 
in favor of the cold condition by 1.8 points. 
Since it was possible that the more experi- 
enced examiners were least biased by the in- 
teraction, rank-order correlations were com- 
puted between the number of W-Bs previ- 
ously given and the dependent variables. None 
of the rho’s were significantly greater than 
zero. 


Discussion 


The results of this study indicate that the 
examiner-subject interaction influenced the 
psychologist’s behavior in the administration 
and scoring of the three subtests of the W-B 
Il. When the instructions to the cold accom- 
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plice are considered, i.e., to answer in mono- 
syllables, to appear disinterested, and the 
typescripts of the sessions studied, it becomes 
clear that the examiners tried to make con- 
tact with the cold S and, in failing to do this, 
became silent. While the examiners were un- 
doubtedly trained to encourage the S, this 
was difficult to do when friendly overtures 
elicited disinterest and rejection. The feelings 
which the interaction aroused in these examin- 
ers obviously influenced the manner in which 
they administered the tests. With warm, re- 
sponsive Ss they tended to encourage and 
question; with cold Ss they tended to remain 
silent. 

It is difficult to predict the extent to which 
this particular finding can be generalized to 
nonlaboratory situations. Probably few indi- 
viduals taking intelligence tests act as hostile 
and nonparticipating as the cold accomplice. 
However, some Ss, notably children, may be- 
come threatened by the testing situation, re- 
sponding with belligerence or silence or other 
variations of avoidance. Schafer (1954) has 
described the dynamics of adult Ss who may 
not interact favorably with the examiner. 
Thus, the cold accomplice represents a cari- 
cature of certain kinds of S behavior, perhaps 
overdone, but posing problems which examin- 
ers face in milder forms with bona fide Ss. 

What effect does the style of administra- 
tion have on the score earned by the S? Since 
the accomplices repeated the scripts prepared 
for them, no direct evidence is available from 
the present study. One experiment (Gordon & 
Durea, 1948) found, however, that a group 
given the Stanford-Binet under conditions of 
discouragement (“You’re not doing well’) 
earned a mean IQ score 6.35 points lower 
than a control group. Lantz (1945) demon- 
strated that the experience of success or fail- 
ure may influence performance on intelligence 
test items. Hutt (1947) concluded that adap- 
tive testing with the Stanford-Binet produced 
higher scores for poorly adjusted children than 
the traditional administration of the test. It 
is very likely, then, that Ss who experience 
discouragement or frustration or who antici- 
pate failure while taking an intelligence test 
may do less well than they might under other 
conditions of interaction. Since poorly ad- 
justed, easily threatened Ss may unwittingly 
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discourage the examiner from relating to them 
favorably, their performance would suffer, not 
because of lack of potential but because of 
the interaction. This, along with their greater 
anxiety and distractibility, could account for 
the depressed IQ scores ordinarily found in 
poorly adjusted children. 

The interaction also affected the examiners’ 
“objective” judgment of the scoring of rela- 
tively “objective” material. Even though they 
had the Wechsler manual available, a re- 
sponse given in the warm condition tended to 
be given greater credit than the identical re- 
sponse given in the cold condition. This bias 
is even more striking when it is considered 
that the scoring occurred some time after the 
testing, allowing the examiners some perspec- 
tive regarding the events of the session. Again, 
this study exaggerated the situation found in 
most clinic settings, since the examiners were 
given responses that were selected because 
they were difficult to evaluate. However, an 
examination of the scoring records indicated 
that there were systematic differences in scor- 
ing even for those responses which were cited 
as examples in the scoring manual. 

The artificial nature of this study—the use 
of accomplices and relatively unsophisticated 
examiners, the exaggerated nature of the in- 
teraction, the use of ambiguous responses— 
together with the inadequate sampling of 
both the examiner and accomplice populations 
(Hammond, 1954) limits severely the gener- 
alization of these findings to nonlaboratory 
settings of psychologists and Ss. What has 
been demonstrated is that in giving an intelli- 
gence test under these conditions an advanced 
graduate student examiner will respond to the 
way Ss interact with him and will act out in 
administration and scoring his feelings about 
the interpersonal situation. 


Summary 


1. Eleven graduate students, each of whom 
had completed at least one course in the ad- 
ministration of individual intelligence tests, 
administered the Information, Comprehension, 
and Similarities subtests of the Wechsler- 
Bellevue II to two subjects. The test subjects 
were accomplices who acted in either a warm 
or cold role to the examiners, giving as their 
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responses memorized answers, 14 of which 
were specifically devised to be difficult to 
score. One examiner became aware of the 
purposes of the experiment and his data were 
not used. Each accomplice had five cold and 
five warm roles, and each examiner saw one 
subject who acted warm and one who acted 
cold. 

2. Each testing session was taped, although 
only eight examiners were able to record 
audible tapes for both of their sessions. From 
the typescripts prepared from these 16 tapes, 
every examiner remark during the course of 
the testing part of the interview was assigned 
one of the following categories: reinforcing 
remark, punishing remark, questioning re- 
mark, miscellaneous. Of the 285 examiner 
statements, two judges independently agreed 
on the rating of 254 of them, for an agree- 
ment of 89%. 

3. The results indicated that in scoring the 
responses, the examiners tended to be more 
lenient to the warm subject than the cold 
(p = .056). The examiners also tended to 
use more reinforcing comments (p = < .025 
> .01), and to give more opportunity to 
clarify or correct responses (p = .025) to the 
warm subject. The magnitude of the differ- 
ences in behavior to the two subjects was 
generally small, with individual differences 
more marked than differences due to the effect 
of the interaction. 


Received June 27, 1958. 
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MORE CONSTRUCT VALIDATION OF THE 
EGO-STRENGTH SCALE’ 


IRVING I. GOTTESMAN 
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The ego-strength construct has a high fre- 
quency in the conversations and reports of 
clinical psychologists. Its origins in psycho- 
analytic theory have not deterred the tough- 
minded from its use; indeed, it was in the 
camp of empiricism that a scale purporting 
to measure it was derived. Surprisingly, Bar- 
ron’s (1953) new scale did not lead to an 
avalanche of original research either using it 
or refuting it. Barron (1956) has continued to 
work with the scale, Wirt (1955) has done 
a further validation of the prognostic utility 
of the Ego-Strength (Es) scale, and Tamkin 
(1957) and Tamkin and Klett (1957) have 
concerned themselves with its construct va- 
lidity (Cronbach & Meehl, 1955). Within the 
framework of construct validation, tests of a 
number of hypotheses about ego strength are 
reported in this paper. Es scores on samples 
of adult and adolescent normal and psychi- 
atric subjects were used. In conjunction with 
the data of other researchers, it is intended to 
form enough strands in the Es scale network 
to more nearly outline its meaning and utility. 


Origins of the Ego Strength Construct 


As early as 1894, Freud (1924) used the 
construct of ego but did not expound upon it 
rigorously until the publication of Das Ich 
Und Das Es in 1923 (Freud, 1927). His now 


1 The author is indebted to the people who made 
the data for this study available and accessible. R. D. 
Wirt was not only the mentor for this paper but also 
cleared the path to data sources and furnished the 
MMPIs for the schizophrenic sample. P. F. Briggs 
obtained access to the Hathaway-Monachesi Study 
raw data, and M. Dunnette made a sample of sales- 
managers available at the Minnesota Mining and 
Manufacturing Co. The A. H. Wilder Child Guid- 
ance Clinic, St. Paul, where the author is a fellow in 
psychology, was the source for the sample of emo- 
tionally disturbed adolescents. 
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classical topographical conception of mental 
life with its tripartite division was first made 
there. Inasmuch as the topic of consideration 
is ego strength and not ego, this detour must 
be defended on the grounds that an absolute 
concept of strength is without utility; strength 
of any kind is relative to the current and po- 
tential demands put upon it. A complete cata- 
logue of ego functions would be extremely 
lengthy and partly idiosyncratic. Only some 
of the more important will be presented here. 
In Freud’s definitive paper he tells us that 
“the ego has the task of bringing the influence 
of the external world to bear upon the id and 
its tendencies, and endeavors to substitute the 
reality principle for the pleasure principle 
which reigns supreme in the id . . . the test- 
ing of reality is rather one of the functions of 
the ego” (Freud, 1927, p. 30). Further teas- 
ing out of functions has been partly obviated 
by the legatees of the ego, the ego psycholo- 
gists, including A. Freud (1946), Hartmann 
(1950), and Nunberg (1948). Hartmann cap- 
tures the difficulty of the task by first defin- 
ing ego negatively and then concluding that 
it is defined by its functions: “Ego is not 
synonomous with personality, or the indi- 
vidual; it does not coincide with the subject 
as opposed to the object of experience; and it 
is by no means only awareness of the feeling 
of one’s own self” (p. 75). The ego organizes 
and controls motility and perception (of self 
as well as the world); it serves as a protec- 


Aive barrier against excessive stimuli; action 


and thinking are also ego functions; it antici- 
pates, synthesizes, and inhibits. Any combi- 
nation of these functions may occur simul- 
taneously. The ego must defend itself against 
id impulses, the reproaches of a punishing 
superego, and the traumata of the real world. 
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More Construct Validation of Ego-Strength Scale 


Implicit in the above is a source and a 
quantity of energy to carry out the functions. 
In analytic contexts the energy is explicitly 
called libido; it in turn emanates from the id 
and is posited as instinctual energy in its 
original form. The available ego libido, in this 
theory, consists of the id libido that has been 
desexualized and deaggressivized, i.e., neutral- 
ized (Hartmann, 1950). It follows that any 
reference to ego strength must, in this con- 
text, pertain to the amount of available libido 
at any given time. Any instrument which at- 
tempts to measure this construct must be 
either a sample or a sign of this available en- 
ergy. The enormity of this task is evident 
since the construct is in constant flux. The 
best that might be aimed for are situational 
confidence limits in its estimation. An ideal 
evaluation would require a balance sheet of 
psychic energy listing the intrapsychic and 
interpersonal assets, liabilities, and reserves. 
Living in a Korean foxhole would be an en- 
vironmental liability, for example, and having 
superior intelligence would be an intrapsychic 
asset. 

Barron’s (1953) conceptualization of ego 
strength, derived from the Es scale item con- 
tent and personality and intelligence test cor- 
relates, involves physiological stability and 
health, a strong sense of reality, feelings of 
personal adequacy and vitality, permissive 
morality, lack of ethnic prejudice, emotional 
outgoingness and spontaneity, and _intelli- 
gence. In the light of the previous discussion 
his sampling of ego functioning may not be 
comprehensive enough for content validity. 


Hypotheses 


If the Es scale really measures ego strength, 
it should be able to discriminate between sam- 
ples about whom relative ego-strength infer- 
ences can be made. Normal adults should 
have higher Es scores than normal adoles- 
cents. Normal adolescents should be higher 
than emotionally disturbed adolescents. Severe 
delinquent adolescents should be between the 
former and the latter. Normal adults should 
be higher than psychoneurotic and psychotic 
adults, and the second, higher than the third. 
Superior adults should have higher Es scores 
than any of the above groups. Ego strength 
has a positive linear relationship to person- 
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ality integration so that Es scores should pre- 
dict the same way. The item content of the 
scale suggested that a fake-good test-taking 
attitude could contribute to plus getting; 
therefore, it was hypothesized that Es was 
highly correlated with the K scale. 


Procedure 


Samples to whom the Minnesota Multi- 
phasic Personality Inventory (MMPI) had 
been administered and about whom relative 
ego-strength inferences could be made were 
gathered. The tests were rescored for Es and 
K. All possible ¢ tests were made in order to 
rank the samples. A rank order correlation 
was computed between the group means for 
Es and K and for one of the subsamples. 


Samples 

Normal adults. Hathaway and Briggs (1957) 
have provided normative data on some new 
MMPI scales for a sample almost identical 
with the original normative group. Their sum- 
mary statistics for over 500 males and fe- 
males on the two scales were used as refer- 
ence. 

Normal adolescents. A random sample of 
31 boys was drawn from all the ninth-grade 
boys in the Minneapolis public schools who 
had no record of delinquent behavior. The 
raw data from the Hathaway-Monachesi 
study (1953) were the source of the sample. 

Severe delinquent adolescents. A random 
sample of 31 boys was also selected from the 
above study, all of whom had been involved 
in serious delinquencies both before and after 
taking the MMPI. 

Emotionally disturbed adolescents. An on- 
going sample of 31 adolescents who were re- 


Table 1 


Es and K Scores for Study Samples 


Group 


53.33 
47.98 
46.19 
40.95 
39.56 
37.90 
37.60 


Superior adult 
Severe delinquent 
Normal adolescent 
Hs-Hy profile 
Schizophrenic 
Disturbed adolescent 
D-Pt profile 


N Es SD K_ SD 
om 21 4.32 19.90 4.07 
31 4.56 13.35 3.90 
“a: 31 6.00 15.00 4.36 ; 
21 7.23 17.90 5.33 
- 25 8.78 15.12 6.39 
tas 31 7.54 12.55 4.71 
25 6.16 13.52 3.03 
: 


Group N 
Improved VA hosp. patient 70 
Air Force officer 160 
Graduate student 40 
Greatly improved clinic patient 9 
Greatly improved VA hosp. patient 19 
Normal male 226 
Improved clinic patient 27 
VA mental hygiene clinic patient 52 
Normal female 315 
Unimproved VA hosp. patient 114 
Unimproved clinic patient 16 


ferred to the A. H. Wilder Child Guidance 
Clinic for diagnostic study was used. The 
sample was about equally divided between 
boys and girls with a median age of 14.5 
years, range, 13 to 18 years. Although all 
were outpatients, some were severe neurotics 
or psychotic. 

Superior adults. A sample of MMPIs of ap- 
plicants for sales positions who were em- 
ployed and subsequently were promoted to 
salesmanagers at the Minnesota Mining and 
Manufacturing Co. was selected. A search of 
the files for the tests of the managers who 
had been hired during the 10-year period in 
which the MMPI was used in selection re- 
vealed only 16 answer sheets. Five additional 
tests were selected randomly from among 
those of applicants who had successfully met 
the hiring requirements. The total of 21 men 
were superior to the norm group (Hathaway 
& Briggs, 1957), at least in education and 
occupational level. The representativeness of 
this small sample on Es was confirmed by the 
median Es score of a sample of 68 similarly 
employed men at another large Twin City 
corporation.” 

Veterans Administration Hospital D and 
Pt profiles. The MMPIs of all patients ad- 
mitted to the psychiatry service at the Min- 
neapolis Veterans Administration Hospital in 
1956 were examined. All tests that had scales 


Normal 


Severe 


Superior 


Adult Delinquent 


2 Rosen, Ephraim. Personal communication, May 8, 
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Table 2 
Es Data from Other Studies 


Normal 


Adolescent Male 


Es SD Investigator 

55.2 7.00 Wirt 

52.73 4.05 Barron 

50.92 5.62 Barron 

49.66 Barron 

48.6 7.03 Wirt 

44.33 6.21 Hathaway & Briggs 
43.07 Barron 

41.79 7.38 Barron 

40.21 6.36 Hathaway & Briggs 
33.3 6.96 Wirt 

32.75 Barron 


D and Pt coded high (Welsh, 1948) in that 
order were selected for study. The profile and 
not the diagnosis was the criterion for selec- 
tion. An N of 25 resulted. 

Veterans Administration Hospital Hs and/ 
or Hy profiles. The above admission MMPIs 
yielded 21 tests with scales Hs and/or Hy 
coded high. Again the profile was the criterion 
for selection. 

Veterans Administration Hospital schizo- 
phrenics. The admission MMPIs on a group 
of Veterans Administration Hospital patients 
subsequently diagnosed as schizophrenic were 
selected for study. The sample consisted of 
21 paranoid schizophrenics and 4 diagnosed 
schizophrenic reaction undifferentiated or un- 
classified. Analysis of the data showed no 
reason to handle them by subclass. The diag- 
nosis of schizophrenia was the criterion for 
this sample of 25. 


Results 


Table 1 shows the results for Es and K for 
the seven samples. Table 2 shows some of the 
data collected on Es in other studies. 

A series of two-tailed ¢ tests for the Es 
scores gave results which may be displayed 
as follows. Groups which are connected by 
either an overscored or underscored line do 
not differ significantly from each other at the 
05 level. 


Disturbed 


Adolescent 


HsHy Sc D-Pt 


1958. 
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The normative group of males did not differ 
significantly from the normal adolescents. 
Severe delinquents were significantly higher 
than the normal males. All of the psychiatric 
groups were significantly lower than the non- 
psychiatric groups. Some error has been left 


Superior  Hs-Hy Se 


Adult Adolescent 


A rank order correlation between Es and K 
for the seven study samples was computed. 
Rho equalled .46; with these few pairs one of 
.78 was needed for significance at the .05 
level. Another rho was computed for the same 
variables for the subsample of 16 salesman- 
agers, and a correlation of .49 was obtained, 
which was significant beyond the .05 level. 
Barron (1953) obtained r’s of .31 for his sam- 
ples of clinic patients and graduate students. 


Discussion 


A parsimonious interpretation of the results 
appears to be that the Es scale broadly dis- 
criminated between psychiatric and nonpsy- 
chiatric adults and adolescents. It did not 
discriminate different degrees of psychiatric 
incapacitation. It did not discriminate delin- 
quent adolescents from nondelinquents, and 
predicted in the opposite direction for the 
former compared to normal adults. It showed 
no relation to age within the limits of this 
study. Its previously demonstrated utility as 
a predictor of favorable response to psycho- 
therapy may be questioned, since severe de- 
linquents, who are generally considered poor 
bets for psychotherapy, did not differ signifi- 
cantly from patients rated greatly improved. 

The correlation between Es and K was 
higher than that originally reported and sug- 
gested some interpretations of what Es may 
be picking up. It may reflect a defensive test- 
taking attitude and the ability to recognize 
socially desirable descriptions of personality. 
Both of these behaviors show contact with 
and testing of reality. Inasmuch as ego de- 
fense and reality testing are both ego func- 
tions, the scale would measure these two as- 
pects of ego strength. It would, however, only 
indicate strength if the defense were not car- 
ried to an extreme. Reacting to the threat of 
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in the disturbed adolescent comparisons by 
not separating boys and girls; the normative 
data showed a raw score difference between 
sexes of four points in favor of males. 

A series of ¢ tests for the K scores of these 
groups gave these results: 


Severe Disturbed 
Male 


Delinquent Adolescent 


not being selected for a job and reacting to a 
threat from persecutory delusions would both 
contribute to plus getting on the Es scale. An 
increasing investment of energy in the defense 
of the ego eventually reaches a point of di- 
minishing returns. The process detracts from 
the energy needed for other ego functions such 
as integration and reality mastering, thus a 
very high Es score would reflect weakness in- 
stead of strength. An empirical suggestion for 
the point of diminishing returns for clinic and 
hospitalized patients comes from the Barron 
(1953) and Wirt (1955) prognosis studies. 
Groups above or below a raw Es score of 
about 49 showed less improvement than those 
near this score. 

The significantly higher Es scores for severe 
delinquents relates to Barron’s (1956) finding 
of a cluster of Q sort items descriptive of ag- 
gressiveness, power-orientation, and disregard 
of the rights of others, which correlated posi- 
tively with Es scores on his Air Force officer 
sample. If added to this line of reasoning are 
the high mean Es for the salesmanagers to- 
gether with the knowledge of above average 
scores on Pd for them, the delinquents, and 
the officers, another interpretation of Es is 
suggested. The scale may be a sign of psychic 
energy per se, whether it be thought of as 
coming from the ego or the id. The inference 
as to source in any particular instance would 
be based on the case history materials and 
the rest of the MMPI profile. Without addi- 
tional information, psychopaths might be 
false positives in selection of good bets for 
psychotherapy. 

Using Es scores in profile analysis to sup- 
plement information provided by the other 
scales is suggested by some of the data that 
now exist. A combination of high K with high 
Es in nonhospitalized subjects implies defen- 
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siveness with the intent to appear socially de- 
sirable. In psychiatric patients the same pat- 
tern suggests ego syntonic behavior and a 
refusal to undertake psychotherapy or, if un- 
dertaken, lack of cooperation in the process. 
A high Es combined with a high Pd suggests 
freedom from intrapsychic conflict; the ag- 
gression and hostility in the personality are 
ego syntonic. A low Es combined with a high 
Pd suggests that the hostility is not ego syn- 
tonic and that the subject may be an acting 
out neurotic. All of these clinical hunches have 
yet to be verified but are suggestive of the 
uses to which the Es scale might be put. 

In conclusion, the multiplicity of ego func- 
tions makes it difficult to assess ego strength 
by observing or knowing the state of a few of 
these functions. Adding to the problem is the 
fact that an excessive amount of energy de- 
voted to one function could be pathological 
but would be manifested by a high score on 
the Es scale. It appears that energy not avail- 
able to the ego may manifest itself on the Es 
scale. Consistent interpretations of what the 
scale does depend both upon the particular 
kind of subject and his test-taking attitude. 
Any valid technique for the assessment of ego 
strength must conceive of the ego as a com- 


plex system and then must combine the re- 
sults of an intrasystemic analysis with inter- 
systemic and interpersonal data. 


Summary 


More construct validation of the Ego- 
Strength scale has been described together 
with a rationale for the properties such a 
scale should ideally possess. The Es scale was 
administered to a variety of samples: superior 
adults, psychiatric adults, emotionally dis- 
turbed adolescents, normal adolescents, and 
severe delinquents. The scale broadly dis- 
criminated between psychiatric and nonpsy- 
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chiatric groups. Suggestions for other inter- 
pretations of what the Es scale may be doing 
were made, as were suggestions for its use in 
profile analysis. 


Received June 27, 1958. 
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VALIDATION OF A NEED SCALING TECHNIQUE 
FOR THE ADJECTIVE CHECK LIST 


ALFRED B. HEILBRUN, JR. 


State University of lowa 


The Adjective Check List (ACL) is a test 
devised by Gough (1955) for personality re- 
search. It contains 300 adjectives which can 
be selectively checked as being characteristic 
of an S’s behavior. Gough suggests three ad- 
vantages of this type of measure—a wide 
scope of behavior can be evaluated, the fa- 
miliar words provide a meaningful task for 
the rater, and the presence-or-absence check- 
ing response assures analytic ease. However, 
one disadvantage of the check list as it is 
typically used is that it provides informa- 
tion which is conceptually circumscribed. For 
example, research may determine that two 
groups of Ss are distinguishable by the differ- 
ential frequency of several adjectives checked 
as characteristic. Although of interest, such 
findings offer limited basis for behavior pre- 
diction or theory construction. 

One method of increasing the conceptual 
utility of the ACL would be to score responses 
by adjective clusters measuring new and 
broader behavior dimensions. LaForge and 
Suczek (cited in Gough, 1950) did this by 
using a sixteen-fold classification of interper- 
sonal behaviors, while researchers at the In- 
stitute for Personality Assessment and Re- 
search (Gough, 1950, p. 38) developed clus- 
ters intended to measure self-acceptance and 
self-criticality. 

Another approach was taken by Heilbrun 
(1958) who developed 15 need scales for the 
ACL, utilizing needs initially described by 
Murray (1938) and later incorporated into 
the Personal Preference Schedule (PPS) by 
Edwards (1954). It was found that the rank 
order correlation between relative need levels 
as evaluated by the need scales and the PPS 
was .60, significant at the .05 level. With per- 
sonal desirability held constant, the first-order 
partial correlation dropped to .35, which did 


not differ significantly from a zero correla- 
tion. Since there has been a limited amount 
of published validational data for the PPS, 
the rather low order correlation did not neces- 
sarily deny the usefulness of the ACL need 
scales. However, it did point out the neces- 
sity of relating need scores with external (non- 
test) criteria in way of evaluating their cur- 
rent utility for personality research. The pres- 
ent study assessed the validity of five of the 
15 need scales by relating test performances 
to such external criteria. 


Procedure 
Subjects 


The Ss for this study were 99 members of 
an undergraduate college course in psychol- 
ogy. This number included 56 females and 
43 males. 


Task 


The present scales for the ACL were first 
rationally derived by having 20 advanced 
graduate students judge which adjectives, if 
checked, would indicate a high level of the 
following needs: achievement, deference, or- 
der, exhibition, autonomy, affiliation, intra- 
ception, succorance, dominance, abasement, 
nurturance, change, endurance, heterosexual- 
ity, and aggression. Agreement among 9 of 
the 20 judges was required for inclusion of an 
adjective on a need scale. The number of 
adjectives on these scales ranged from 19 
(succorance) through 33 (aggression). The 
resultant adjective clusters were those used 
in a previous study (Heilbrun, 1958). These 
scales were later revised by having 19 ad- 
vanced graduate students judge which adjec- 
tives, if checked, would contraindicate a high 
level of a given need. Agreement among 9 
of the 19 judges was required for inclusion 
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The number of contraindicative adjectives for 
each scale ranged from 7 (intraception) to 36 
(aggression ) 

A need score on the ACL was obtained by 
subtracting the number of contraindicative 
adjectives checked by S from the number of 
adjectives indicative of a high need level 
which were endorsed as self-descriptive. 


Procedure 


All members of an undergraduate psychol- 
ogy course were given the ACL under stand- 
ard instructions on the first day of class. Ap- 
proximately three months later factual infor- 
mation was obtained by questionnaire from 
the Ss and from university records. These data 
were selected as validating criteria for five 
need scales—achievement, exhibition, affilia- 
tion, nurturance, and abasement. The behav- 
iors which will be presented as descriptive of 
these needs are taken from Edwards (1954) 
and were used by the judges in deriving the 
need scales. 


Results 
Achievement 


The following behaviors characterize the 
achievement need: 


To do one’s best, to be successful, to accomplish 
tasks requiring skill and effort, to be a recognized 
authority, to accomplish something of great signifi- 
cance, to do a difficult job well, to solve difficult 
problems and puzzles, to be able to do things better 
than others, to write a great novel or play. 


The criterion for achievement need was col- 
lege grade point average (GPA) with an S’s 
estimated intelligence held constant. The in- 
tellectual estimate was made from a vocabu- 
lary test included in the entrance battery ad- 
ministered to all entering freshmen at the 
State University of Iowa. The distributions 
of available GPAs and vocabulary percentile 
scores were made for the experimental Ss and 
each distribution was divided into quartiles. 
A high achiever (HA) was defined as an S 
whose GPA rank was at least two quartiles 
above ‘his vocabulary rank. Those Ss who ap- 
peared in the third quartile of the vocabulary 
distribution and the fourth quartile of the 


1 These need keys may be obtained without charge 
from Alfred B. Heilbrun, Jr., Department of Psy- 
chology, State University of Iowa, Iowa City, Iowa. 
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Table 1 


Comparison of Means for Validation of 
Five ACL Need Scales 


Need N Mean SD t df p 
Achievement 

High 9 12.89 593 212 58 .04 
Low 51 8.51 5.82 
Exhibition 

High 24 «#1413 484 213 44 04 
Low 22 10.95 4.89 
Affiliation 

High 25 15.04 7.36 217 47 
Low 24 1017 7.79 
Nurturance 

High 32 5272 1.7323 
Low 36 4.05 2.34 
Abasement 

High 26 -104 492 269 48 O01 
Low 24 4.71 


2.67 
GPA distribution were not considered nor 
were those Ss considered who fell in the fourth 
quartiles, respectively, of these distributions. 
The rationale for elimination was that in 
neither case was it possible for an S to 
qualify as an HA because of the high vo- 
cabulary standing. 

The performances of HAs on the Achieve- 
ment scale were compared with the remaining 
Ss, and the results are presented in Table 1. 
It can be seen that the high achievement 
group scored significantly higher on the 
Achievement scale than did the remainder of 
the experimental Ss. 

Exhibition 

Edwards describes exhibition as follows: 

To say witty sayings, to tell amusing jokes and 
stories, to talk about personal adventures and ex- 
periences, to have others notice and comment upon 
one’s appearance, to say things just to see what 
effect it will have on others, to talk about personal 
achievements, to be the center of attention, to use 


words that others do not know the meaning of, to 
ask questions others cannot answer. 


It is obvious from these descriptive behav- 
iors that a person with a high exhibition need 
would seek social interactions with others, 
since their responses would be necessary for 
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need satisfaction. It seems reasonable to as- 
sume that such a person would be more group 
oriented than would a person without high 
need exhibition. Therefore, the validating cri- 
terion for the Exhibition scale was the num- 
ber of group activities (social, academic, ath- 
letic, church, etc.) of which S had been a 
voluntary member from the time he had en- 
tered high school until the present. The error 
introduced by age differences between Ss, 
which would provide a greater time span since 
entering high school, was minimal because of 
the homogeneous age make-up of the experi- 
mental group (M = 21.08; SD = 3.09). Also 
an appreciable relationship between age and 
Exhibition scale scores would be necessary 
before any systematic error would be intro- 
duced, and a performance comparison of the 
22 youngest and oldest Ss on this scale did 
not support this contention (¢ = 0.11; p= 
92). 

The number of groups joined by those Ss 
who fell in roughly the highest and lowest 
quartile of scores on the Exhibition scale was 
compared. The comparison, shown in Table 1, 
shows that high scorers on the scale were 
members of a reliably greater number of 
groups than were low scorers. 


A filiation 


Behaviors descriptive of a high affiliation 
need are: 


To be loyal to friends, to participate in friendly 
groups, to do things for friends, to form new friend- 
ships, to make as many friends as possible, to share 
things with friends, to do things with friends rather 
than alone, to form strong attachments, to write 
letters to friends. 


The criterion which was selected for the 
Affiliation scale was the number of friends 
which an S had. Experimental Ss were asked 
to list the names of people they had met 
within the previous five years whom they 
would consider “good friends.’ This definition 
of a “good friend” was provided: “A person 
whose company you seek and enjoy and with 
whom you would be willing to discuss impor- 
tant personal experiences.” Relatives were not 
included. 

The mean number of people considered to 
be “good friends’ by Ss whose scores fell 
within roughly the highest and lowest quar- 
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tiles on the Affiliation scale is given in 
Table 1. These data demonstrate a signifi- 
cantly greater number of people considered to 
be “good friends” by Ss scoring highest on the 
Affiliation scale than by Ss scoring lowest. 


Nurturance 


Nurturant behaviors are described in this 
manner: 


To help friends when they are in trouble, to assist 
others less fortunate, to treat others with kindness 
and sympathy, to forgive others, to do small favors 
for others, to be generous with others, to sympathize 
with others who are hurt or sick, to show a great 
deal of affection toward others, to have others con- 
fide in one about personal problems. 


Validation of the Nurturance scale was at- 
tempted by relating it to the number of chari- 
table, medical research, rehabilitation, church, 
or educational activities to which S had 
contributed time, money, or personal effects 
within the previous two years. The mean fre- 
quencies of contributions made by Ss falling 
approximately in the upper third and lower 
third of scores on the Nurturance scale are 
shown in Table 1. It can be seen that Ss scor- 
ing high on the Nurturance scale made reli- 
ably more contributions than did the lower 
scoring Ss. 


Abasement 


Characteristic behaviors associated with a 
need for abasement are as follows: 


To feel guilty when one does something wrong, to 
accept blame when things do not go right, to feel 
that personal pain and misery do more good than 
harm, to feel the need for punishment for wrong 
doing, to feel better when giving in and avoiding a 
fight than when having one’s way, to feel depressed 
by inability to handle situations, to feel timid in the 
presence of superiors, to feel inferior to others in 
most respects. 


The validating procedure involved the pre- 
diction of expected course grade by the ex- 
perimental Ss at the first class meeting. The 
predicted grade for each S was compared with 
his then current cumulative grade point aver- 
age, these differences providing the criterion 
measure. The specific prediction made was 
that Ss who estimated course grades markedly 
in excess of their current average would show 
lower scores on the Abasement scale. 

The distribution of discrepancy scores was 


a 
fy 
| 
i 


Achievement 


“Trrelevant”’ 
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Table 2 
Significance of Differences in “Irrelevant” Criterion Performance of Approximately the Highest and 


Lowest Quartiles of Ss on the Various Need Scales 


Exhibition 
Scale Scale 


Affiliation Nurturance Abasement 
Scale Scale Scale 


Criterion 


Achievement 

Exhibition .66 

Affiliation 59 1.90 
Nurturance 1.14 A3 


Abasement 


* Significant at p < .01 level. 
** Significant at p < .03 level. 


separated at roughly the highest and lowest 
40%, high in this case referring to predicted 
course grades markedly above grade point av- 
erage. The mean Abasement scale scores for 
these two groups are found in Table 1. These 
data show, as predicted, that Ss with high 
discrepancy scores were reliably lower on the 
Abasement scale than Ss with low discrep- 
ancy scores. 


Adequacy of the Criteria 


Despite the consistent finding of significant 
relationships between the five need scales and 
the selected criteria, one further analysis was 
necessary before the results could be accepted 
as validating this mode of personality assess- 
ment. It should be demonstrated that the se- 
lected criteria are relevant. One method of 
evaluating relevancy would be to determine 
the relationships between each criterion and 
the four scales which were not validated 
against it. These relationships should be less 
clear than were those previously reported be- 
tween need scales and “relevant” criteria. The 
check was made by comparing the “irrele- 
vant” criterion performance of roughly the 
highest and lowest quartiles of Ss on each 
scale. An exception to this procedure was 
made for the Achievement scale criterion 
which was not a continuous variable. In this 
case, the performances of the criterion groups 
(nine high achievers, 51 remaining Ss) on the 
“irrelevant” scales were compared. The results 
of this analysis are presented in Table 2. It 
can be seen that only two of the 20 relation- 
ships reached the .05 level of significance 


t 


(one would be expected by chance). This 
would seem to support the contention that the 
variables selected for this study are relevant 
validating criteria.” 


Discussion 


The finding that performance on each of 
the five need scales for the ACL was signifi- 
cantly related to its criterion variable gives 
rather clear support to this technique of per- 
sonality assessment. It should be stressed that 
the ability of a scale to reliably discriminate 
between extremes on a criterion dimension 
does not necessarily mean that such an in- 
strument would be useful for individual pre- 
diction. However, since the ACL and the need 
scales which are currently being investigated 
are intended as research tools, individual pre- 
diction is not a crucial factor. The status of 
current psychological research is such that 
group predictions or group discriminations are 
the rule rather than the exception. The ques- 
tion of whether this technique of adjective en- 
dorsement has potential clinical utility would 
certainly seem to be premature, if not pre- 
sumptuous, at this stage of development. The 


2A sixth need scale, dominance, was initially in- 
cluded in this validational study but was eliminated 
following the analysis of relationships of the selected 
criteria with “irrelevant” scales. Although the domi- 
nance scale demonstrated the pretiicted relationship 
with number of elected group offices held, high 
achievement, affiliation, and nurturance Ss also re- 
ported significantly (p< .05) more elected offices 
held than did Ss low on these scales. Accordingly, 
the criterion was not deemed an adequate validating 
dimension for the dominance scale. 
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Validation of a Need Scaling Technique 


research utility of an easily administered, 
short, objectively scored measure of motiva- 
tional status is evidenced by the expanding 
literature on the Manifest Anxiety scale de- 
veloped by Taylor (1956). The current goal 
in developing the need scales is to expand the 
possibilities for group motivational research. 

Since four of the five validating criteria 
were obtained by verbal report of the experi- 
mental Ss, the question of dissimulation could 
be raised. There are two factors which would 
seem to curtail the influence of this source of 
error. The first is that E was also the class- 
room instructor, and it was felt that a rather 
good rapport existed at the time the autobio- 
graphical material was obtained. The second 
factor is that Ss were required to make their 
responses explicit by naming groups, friends, 
and recipients of donations. Accordingly, dis- 
tortion under these circumstances would have 
to approach outright prevarication. In light of 
the rapport, promised confidentiality of the 
material, and a frank exhortation for honesty 
by E, deliberate distortion should not have 
been a major source of error. 

Dissimulation on the ACL, when it is ad- 
ministered as a self-descriptive task, still re- 
mains a problem. It has been demonstrated 
(Heilbrun, 1958) that frequency of endorse- 
ment for adjective need clusters is highly re- 
lated to judged social desirability of the ad- 
jectives characteristic of these needs. The ex- 
tent to which this source of test performance 
variance contributes to predictive error is not 
clear, since it seems defensible to assume that 

, some human social behavior is also highly re- 
lated to a person’s judged desirability of the 
behavior in question. Thus, merely because 
test performance is determined to some ex- 
tent by desirability of test alternatives does 
not preclude behavioral prediction. However, 
the extent of the correlation between fre- 
quency of adjective endorsement and judged 
social desirability of the adjectives (p = .98) 
suggests that error is involved. Further re- 
search is necessary to evaluate the magnitude 
of testing error introduced by desirability fac- 
tors and to investigate corrective procedures 
on the ACL. 

Summary 


This study related performance of 99 col- 
lege Ss upon five rationally derived need scales 
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for the Adjective Check List to appropriate 
validating criteria in way of evaluating the 
usefulness of this mode of objective person- 
ality appraisal. 

High achieving Ss, as defined by relatively 
high grade point averages with estimated in- 
telligence held constant, scored significantly 
higher on the Achievement scale than did Ss 
who did not show this characteristic. 

High scorers on the Exhibition scale had 
entered into significantly more group activi- 
ties than had Ss scoring low on this scale. 

Ss with high scores on the Affiliation scale 
had met a reliably greater number of people 
within the previous five years whom they de- 
scribed as “good friends” than had Ss with 
low scale scores. 

High performers on the Nurturance scale 
had made a reliably greater number of con- 
tributions of time, money, or personal effects 
than low performers. 

Ss who, when asked to predict their course 
grades, made predictions markedly in excess 
of their current grade point averages scored 
significantly lower on the Abasement scale 
than Ss who showed smaller discrepancies. 

Adequacy of the validating criteria was in- 
dicated by the failure to find clear relation- 
ships between these criteria and those scales 
for which they were “irrelevant” in this study. 

The findings were seen as indicating the po- 
tential usefulness of this technique of objec- 
tive need measurement in personality research. 


Received June 27, 1958. 
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THE SCREENING VALUE OF THE CORNELL 
MEDICAL INDEX 


M. POWELL LAWTON ! 


Norristown State Hospital, Norristown, Pennsylvania 


The Cornell Medical Index Health Ques- 
tionnaire (CMI) has seen wide use as a de- 
vice for screening possible psychiatric pa- 
tients. In the author’s own experience it has 
proved particularly useful because, among 
all self-administering questionnaires, it seems 
least threatening to the average person. The 
nonthreatening quality, which undoubtedly is 
due to the medical orientation of most of the 
questions, is especially valuable when one 
must select emotionally disturbed persons 
from a population which includes many av- 
erage people with a layman’s prejudices and 
fears regarding emotional illness. Particularly 
efficient use may be made of such a short, 
nonthreatening test when a group is to be 
tested for research purposes and there is not 
time for individual interview and rating of 
patients. Further information is needed, how- 
ever, as to whether the CMI actually discrimi- 
nates the reasonably normal person from the 
clinically maladjusted person, and whether it 
discriminates among degrees of maladjust- 
ment. 

The authors of the test have provided some 
confirmation of the former question (Brod- 
man, Erdmann, Lorge, Deutschberger, & 
Wolff, 1954; Brodman, Erdmann, Lorge, 
Gershenson, & Wolff: 1952a, 1952b) by dis- 
tinguishing psychiatric general hospital pa- 
tients from other hospital patients, and psy- 
chiatric army “rejects” from psychiatric “ac- 
cepts.”” The second question, which has not 
been investigated, is particularly relevant if 
the test is to be used as a criterion for extent 
of psychopathology in research populations, 
as suggested by the authors (Brodman, Erd- 
mann, & Wolff, 1955, p. 3). 

1 This study was initiated while the author was at 


the Veterans Administration Hospital, Providence, 
Rhode Island. 


The present study attempted to determine 
(a) whether the CMI actually discriminates 
degrees of maladjustment, (b) what are the 
best indices for such discrimination, and (c) 
whether chronic illness is a variable that tends 
to weight scores in the direction of psycho- 
pathology. 


Procedure 


A rough order of “likelihood of nervous 
symptoms” was arrived at in the following 
manner. CMI blanks were given to 116 con- 
secutive male veteran admissions to a general 
hospital (excluding those who were critically 
ill or uncooperative), and to 34 consecutive 
nonpsychotic admissions to the psychiatric 
service. The treatment files on each patient 
for present, as well as past, admissions to the 
hospital were consulted, and after careful 
reading of all available data the patient was 
placed in one of the following four categories: 

I. Patients whose records contained no inti- 
mation of any psychological disturbance or 
the presence of any illness considered “psy- 
chosomatic” (N = 43). 

II. Patients whose records contzined either 
some notation by a physician of nervousness 
or tension > (NV = 17) or an official diagnosis 
of an illness mentioned as showing prominent 
emotional concomitants by Weiss and English 
(1949) (NV = 21).° Since such a spontaneous 
evaluation by a physician might refer to a 
“nervous” trait either in the normal or the 
pathological range, and since a “psychoso- 
matic” diagnosis implies greater likelihood of 

2 Typical physicians’ comments were “marked emo- 
tional lability,” “feels tense and jittery,” “a rather 
nervous individual.” 

3 This psychosomatic group consisted of 15 cases 
of peptic ulcer, 9 essential hypertension, 3 diabetes 


mellitus, 3 bronchial asthma, and one migraine head- 
ache. 
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Screening Value of Cornell Medical Index 


Table 1 


Sigma 


15.71 
14.93 
11.52 
11.94 
14.07 


emotional symptomatology than absence of 
such medical diagnosis, it was believed that 
a good screening device should detect more 
maladjusted individuals in Group II than in 
Group I. 

III. Patients who were at that time hos- 
pitalized on the medical and surgical services 
and whose records indicated an official diag- 
nosis of a psychiatric condition, either for the 
previous hospitalization or on previous occa- 
sions (V = 25). It was reasoned that the pa- 
tient whose disturbance was marked enough 
to warrant an official psychiatric diagnosis 
was more likely to show a wide range of 
symptoms than patients in Group II. 

IV. Patients at the time hospitalized on 
the psychiatric service (NV = 34). They should 
logically show a wider range of disturbances 
than those in Group III, whose psychiatric 
disability was probably not causing their hos- 
pitalization at the time of taking the test. 
Whenever a patient’s record showed descrip- 
tions applying both to Group II and either 
Group III or Group IV, he was placed in the 
group of greater maladjustment. Such an or- 
dering of degree of maladjustment, while quite 
rough, seems logically defensible and has the 
advantage of being unbiased and relatively 
objective. 

Table 1 lists the composition of the groups. 
An analysis of variance and Bartlett’s test in- 
dicate that the age means and variances do 
not differ significantly among groups. Inas- 
much as education has been shown to be un- 
related to CMI scores (Brodman, Erdmann, 
Lorge, & Wolff, 1953), it was not considered 
in this study. 


Results 


The authors suggest (Brodman et al., 1955) 
that psychiatric illness be suspected if the pa- 


Table 2 


Cumulative Percentages of Total Score by Groups 
Group 


Group Group Group 


Ill 


25 
30 
35 
40 
50 
70 


tient’s CMI protocol is characterized by one 
or more of the following signs: (a) Total 
Score (number of Yes responses) 30 or more; 
(6) Page 4 Score 3 or more (“Moods and 
feelings” items on page 4 of the question- 
naire); (c) Sections I and J Score 3 or more 
(questions dealing with fatigue and frequency 
of illness); (d) Omit, etc., 4 or more ‘Ques- 
tions Omitted, answered both Yes and No, or 
with changes or comments added). In addi- 
tion, the discriminating power of the follow- 
ing indices of maladjustment was evaluated 
by classifying as pathological any patient who 
showed (e) Areas Score—7 areas or more 
among the “body complaints” with more than 
one Yes response (questionnaire pages 1-3) ; 
(f) Page 1-3 Score 28 or more (body com- 
plaint questions). 

(a) Total Score. The cumulative percent- 
age distribution of Total Score is indicated in 
Table 2 for the four patient groups. For every 


Table 3 


Cumulative Percentages of Page 4 Scores by Group 


Group Group Group Group 


Ill I\ 
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Age Composition of the Experimental Groups 
Group N Mean | 
- Score I II 
I 43 39.5 
oe II 48 43.5 10 74 85 92 97 
Ill 25 42.6 20 42 62 72 85 
‘4 IV 34 40.2 40 47 52 82 
All groups 150 41.5 26 40 48 76 
19 34 40 71 
19 17 40 71 
9 11 28 62 | 
5 6 8 35 
2 2 4 18 
Score I Il 
— 
60 67 84 94 
51 60 76 88 
ae 35 50 60 79 
ioe 26 48 54 79 ; 
21 42 4 79 
21 37 4 79 3 
19 31 40 79 
— a 19 31 40 79 
a, 16 27 24 76 
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Table 4 


Median Scores on C.M.I. Measures and Percentages of Patients Pathologically Diagnosed by Each Measure 


Measure 
Total Score Page 4 Sect. 1 & J Omit, etc. Areas Page 1-3 
Group Median % Median % Median % Median % Median % Median % 
I 18 26 2 35 0 30 0 16 4 33 15 23 
II 23 40 2 50 1 35 1 23 5 45 18 34 
III 26 48 4 60 3 54 1 24 6 64 22 40 
IV 53 76 7 7 71 2 35 76 34 


Chi square 20.4 
p <.001 


cutting point but one (40 or more), until the 
frequencies become too small for significance, 
this index orders the scores in the predicted 
manner: the percentage of Yes responses be- 
comes greater as the groups increase in likeli- 
hood of emotional illness. 

(b) Page 4 Score. Table 3 lists the cumu- 
lative percentages of Yes responses on Ques- 
tionnaire Page 4. Until a cutting point of 
eight is reached, this section of the question- 
naire succeeds in ordering the four groups as 
predicted. 

(c) Sections I and J Score. These results 
are tabulated in Column 3 of Table 4, which 
is a summary table of all the specific scoring 
criteria that were evaluated. This index ap- 
pears to discriminate well among the groups. 

(d) Omit, etc. Column 4 of Table 4 indi- 
cates the relative occurrence of this sign. Dif- 
ferences among groups, though in the pre- 
dicted order, appear to be statistically ran- 
dom. However, since the over-all frequency 
of occurrence was low, it was decided to di- 
chotomize the distribution at the median, 
which was one. Using this cutting point, 
Group I showed 27% at or above the me- 
dian, Group II showed 60% above, Group III 
52%, and Group IV 79%. These differences 
were significant at the .01 level, but the pre- 
dicted direction of differences was reversed 
for Groups II and ITI. 

(e) Areas Score. Column 5 of Table 4 
shows that this index reliably distinguishes 
the groups. 

(f) Page 1-3 Score. Column 6 of Table 4 
indicates that this index, while still giving 


3.84 
>.05 


16.9 
<.001 


significant discrimination among groups, was 
less efficient than either the Total Score or 
the Page 4 Score. 

Chronic illness was defined in a rough, arbi- 
trary fashion as being either (a) episodic ill- 
ness severe enough to have caused at least 
two hospitalizations previous to the present 
one or (6) the present illness having currently 
or at some previous time lasted for a period 
of one year or more, with some vocational 
disability. By these criteria, 23 patients out 
of the 116 comprising Groups I, II, and III 
were classified as having chronic illness. A 
chi square test indicated that the proportion 
of Group I patients in the chronic illness 
group did not differ significantly from the 
proportion of Group I patients in the entire 
group. Therefore, there was no difference be- 
tween the chronically ill and the nonchroni- 
cally ill in the presence of overt psychopa- 
thology. Each patient having a chronic illness 
was matched with a patient without chronic 
illness of the same group and age. In total 
score, 52.1% of chronic patients scored above 
the median, while 47.9% of the nonchronic 
patients scored above the median (p > .05). 
No difference was found between chronic and 
nonchronic patients in Page 4 Score (p> 
.05), also using the median test (Moses 
1952). 


Discussion 


The CMI is apparently able to distinguish 
between degrees of emotional illness. It seems 
reasonable to think that had a more exact 
criterion of maladjustment been used, the dis- 
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crimination among groups would have been 
even sharper. 

Inspection of Table 4 reveals that for the 
present sample, best discrimination among all 
groups was provided by the total score and 
the Page 4 Score. The frequency distribution 
of Table 2 indicates that the use of total 
score of 35, 40, or 50 as cutting points would 
heighten the discrimination between known 
psychiatric and presumably nondisturbed pa- 
tients. A cutting score of 50 identified 60°% 
of this sample of known male psychiatric pa- 
tients, while in other studies (Brodman et al., 
1952a; Arnhoff, Strough, & Seymour, 1956) 
this score identified only about 45% and 
36%, respectively. Since both of these ear- 
lier studies used outpatients, the higher scores 
of the present group could conceivably reflect 
more severe illness in the hospitalized psy- 
chiatric population. 

On the other hand, if best discrimination 
among patients admitted for medical and 
surgical conditions (Groups I, II, and III) 
is desired, the use of a cutting score of 20 is 
indicated by Table 2. The cutting score of 
30 originally suggested by the test’s authors 
(Brodman et al., 1955) would seem to give 
the best over-all discrimination. Likewise, 
from Table 3 it is seen that while cutting 
scores of 6, 7, or 8 on mood and feeling items 
give best separation of the extreme groups, 
the authors’ suggested score of 3 seems to be 
preferable when attempting to pick out emo- 
tionally disturbed medical and surgical pa- 
tients. 

These results reinforce the suggestion that 
interpretation of screening devices be varied 
according to the population tested and the 
purpose for which they are used (Meehl & 
Rosen, 1955). For instance, the present re- 
sults, when compared with other studies (Arn- 
hoff et al., 1956; Brodman et al.: 1954; 
1952a; 1952b), suggest tentatively that V.A. 
medical and surgical patients, psychiatric out- 
patients, and psychiatric inpatients may tend 
to receive higher scores than nonveteran 
populations. Though much larger samples 
would be needed to establish the existence of 
such broad population differences, the present 
study suggests that different cutting scores 
may be better for V.A. populations. 

In view of the low frequency of occurrence 


of the Omit, etc., sign, and a failure to vali- 
date them when each was considered sepa- 
rately (Arnhoff et al., 1956), the present 
study indicates that lumping them all to- 
gether and considering a single occurrence as 
possibly noteworthy may be of value. Prob- 
ably this sign is most efficient as a gross in- 
dicator of psychopathology, inasmuch as the 
intermediate groups were not ranked in the 
predicted order. 

Lack of information on the base rate of 
emotional disturbance in V.A. general hos- 
pitals makes interpretation of the predictive 
value of the CMI difficult. However, it is 
probable that a high proportion of admissions 
to general hospitals have some degree of ob- 
servable psychopathology, judging by such 
data as Zwerling, Titchener, Gottschalk, 
Levin, Culbertson, Cohen, and Silver (1955). 
Even if it is assumed that all patients in 
Group I are, as the criterion indicates, psy- 
chiatric “normals,” a maximum false positive 
rate of 20% was obtained (Table 4), while 
the false negative rate was 24°%, among Group 
IV. Under these conditions, use of the CMI 
would improve upon base rates if the pro- 
portion of maladjusted patients admitted 
were as low as 1:4 (Meehl & Rosen, 1955). 
The ability to pick out a fair proportion of 
probably maladjusted individuals from a large 
group makes the CMI valuable in one of its 
prime functions, that of serving as a criterion 
for the inclusion of subjects in “normal” re- 
search populations, 

Finally, a preliminary determination of how 
and to what extent chronic physical illness is 
confounded with emotional illness in the ele- 
vation of CMI scores was made. In this group, 
when both age and degree of overt maladjust- 
ment were held constant, chronic illness in it- 
self did not seem to inflate CMI scores. Should 
this conclusion be confirmed in future work it 
would add strength to the assertion that high 
CMI scores are mainly a function of psycho- 
pathology and that use of the CMI as a psy- 
chiatric screening device in a general medical 
setting does not risk inordinate numbers of 
false positives. 

Summary 

The value of the Cornell Medical Index as 
a psychiatric screening device in a general 
medical hospital was investigated. An ordinal 
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scale of presumed degree of emotional illness 
was obtained by classifying 116 consecutive 
medical and surgical admissions as (I) with- 
out evidence in their records of psychiatric 
disorder, (II) presence of clinically noted 
tension or psychosomatic disorder, (III) pres- 
ence of clinically diagnosed psychiatric dis- 
order. A further group of 34 consecutive 
psychiatric admissions constituted (IV) the 
“most probably ill” group. Various CMI 
scores as suggested by the authors were 
studied, and it was found that these scores 
consistently ordered the four groups in terms 
of measured psychopathology, as predicted. 
Most of the authors’ suggested cutting points 
were confirmed, but there was some evidence 
that different cutting points would be valu- 
able for different purposes. In a small sample 
of 23 cases, it was found that presence of 
chronic illness did not by itself increase the 
likelihood of psychiatric diagnosis by the 
CMI. It was concluded that the CMI is valu- 
able both as a screening device for choosing 
groups and as an individual test. 


Received July 7, 1958. 
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SYMBOLISM VALIDITY AND LEARNING WITHOUT 
AWARENESS 


CAROLINE TAYLOR MACBRAYER 
Davidson College 


In a recent publication, Levy (1954) re- 
ported evidence against the existence of a uni- 
versal sexual symbolism and warned against 
the current practice of assuming, in blind 
analysis of projective materials, that universal 
sexual symbolism is a valid concept. Levy’s 
statement (1954, p. 45) that his results 
should not be generalized to any great extent 
beyond his experimental population of fifth- 
grade children, plus continued and renewed 
emphasis on sexual symbolism in current re- 
search (Clark, 1955; Hall, 1953), were the 
chief stimuli for the present investigation. 
The two purposes of the present experiment 
are to determine whether Levy’s (1954) nega- 
tive conclusion concerning the existence of a 
universal sexual symbolism of the Freudian 
or classical type holds true for college stu- 
dents, and to determine whether learning of 
the meaning of symbols in relation to sex, 
without awareness of what is being learned, 
can be experimentally produced. This latter 
aim was suggested by current studies on 
learning without awareness of what is being 
learned (Cohen, Kalish, Thurston, & Cohen, 
1954; Greenspoon, 1955; Philbrick & Post- 
man, 1955; Postman & Jarrett, 1952), espe- 
cially the experiment by Philbrick and Post- 
man (1955), which showed that a relatively 
simple task facilitates learning without aware- 
ness. 

Essentially, the present experiment is an 
investigation of the influence of unconscious- 
ness, as defined in at least two of the 16 ways 
discussed by Miller (1942, pp. 21-44), on 
the responses of subjects (Ss) to sexual sym- 
bols. A strictly psychoanalytic concept of un- 
consciousness (Miller, 1942, pp. 42-43) was 
used by Levy (1954), who found that un- 
conscious classical or Freudian sexual sym- 
bolism did not produce significantly greater 


learning by the paired-associates technique in 
one group of fifth-grade children, for whom 
boys’ first names were paired with abstract 
“male” figures (long, pointed) and girls’ first 
names were paired with abstract “female” fig- 
ures (round, containing), as compared with 
another group of fifth-grade children, for 
whom boys’ names were paired with “female”’ 
figures and girls’ names with “male” figures. 
Levy (1954) also found that these same chil- 
dren did not pair the five girls’ names with 
the five female figures or five boys’ names 
with the five male figures at a significantly 
better than chance level. 

The present study is designed in a simi- 
lar, but somewhat more extended and rigor- 
ous, manner to investigate this same psy- 
choanalytic concept of unconscious sexual 
symbolism. It also attempts to investigate 
unconscious learning during the experiment; 
unconscious learning being defined here as 
learning without awareness of what is being 
learned (Miller, 1942, pp. 43-44), or learn- 
ing without being able to communicate or 
verbalize what has been learned (Miller, 1942, 
pp. 38-41 and p. 292). Hypotheses to be 
tested by the experiment are as follows: 

1. Classical or Freudian unconscious sexual 
symbolism is valid for male college students. 

2. Learning of the characteristic or charac- 
teristics of maleness or femaleness of abstract 
figures can occur without awareness on the 
part of the S of what is being learned. 

In addition to testing these two hypotheses, 
the present experiment aimed to determine, 
within the current experimental procedure, 
whether unconscious classical sexual sym- 
bolism, or learning during the experiment 
without awareness of what is being learned, 
is a stronger determiner of responses to ab- 
stract sexually symbolic figures. 
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Method 


Ss were 80 male college students in a men’s 
college, none of whom had taken a college 
course in psychology. They were divided into 
four experimental groups, each of which con- 
tained 20 Ss. The 13 freshmen, 23 sopho- 
mores, 34 juniors, and 10 seniors were as- 
signed at random to one of the four experi- 
mental groups, the number of Ss from each 
class in each of the four groups being roughly 
equivalent. 

The procedure was the same for all groups 
except that the nature and order of materials 
presented to each group differed, as indicated 
below. Materials consisted of the 10 abstract 
figures * used by Levy (1954), five of which 
were “male” (elongated or pointed) and five 
“female” (rounded or containing) and the 
five boys’ and five girls’ first names which 
Levy used, plus 10 different abstract figures 
designed by the author,’ five male and five 
female, with ten different first names—five of 
boys and five of girls. Each S performed two 
similar tasks of attempting to memorize names 
paired with figures, one task involving Levy 
figures and names, and the other, MacBrayer 
figures and names. For each S, boys’ names 
were paired with male figures and girls’ names 
with female figures in one memory task 
(called “correct” condition in the present ex- 
periment), while in the other memory task, 
boys’ names were paired with female figures 
and girls’ names with male figures (called 
“reversed” condition in the present experi- 
ment). The nature of materials and order of 
presentation for the four experimental groups, 
designed to counterbalance for practice effects 
and any difference in difficulty between Levy 
and MacBrayer figures and names, were as 
follows: 


First Memory 
Group Task 
1. MacBrayer figures, 
correct 
2. MacBrayer figures, 
reversed 
- 3. Levy figures, 
correct 
4. Levy figures, 
reversed 


Second Memory 
Task 
Levy figures, 
reversed 
Levy figures, 
correct 
MacBrayer figures, 
reversed 
MacBrayer figures, 
correct 


Materials were presented in mimeographed 
form. One minute was allowed for memoriz- 


1 Two figures were slightly modified. 
2 Available on request from the author. 


ing the names of the 10 figures in the first 
task, followed by a three-minute recall period 
during which the same figures were presented 
in a different order and S was asked to write 
the names of all the figures he could recall, 
guessing for those which he could not recall. 
At the end of this three-minute recall period, 
the same procedure, using S’s second memory 
task, was repeated. The experiment was pre- 
sented to Ss as a study to determine how 
much ability to memorize improves during the 
four years in college. Each S was asked to put 
his class but not his name on his paper. Im- 
mediately after completion of the test, Ss were 
asked to comment on anything they had no- 
ticed about the experiment, such as the na- 
ture of the figures or anything they thought 
the test might measure in addition to ability 
to memorize. Four Ss were eliminated from 
the experiment because they verbalized the 
principle at least partially, and four Ss were 
eliminated because they made no errors un- 
der either correct or reversed conditions; the 
data for these Ss are not included in the re- 
sults of the 80 Ss reported in the present ex- 
periment. 


Results 


Results are presented in Table 1. There is 
clearly no significant difference in memory for 
names of figures under correct as compared to 
reversed conditions. In other words, memory 
for names of figures is not significantly facili- 
tated by boys’ names being paired with male 
figures and girls’ names with female figures, 
rather than vice versa. This agrees with Levy’s 
(1954) findings for fifth-grade children, which 
was interpreted by him as evidence against 
universal sexual symbolism. Levy assumed 
that if universal sexual symbolism exists, 
positive transfer effects in memorizing would 
result from correct pairing of figures and 
names and negative transfer effects from re- 
versed pairing. While these data reveal no evi- 
dence for classical sexual symbolism validity, 
an analysis of the nature of the errors in the 
present experiment, under “forced guessing” 
conditions, reveals a different story. 

Under all conditions, a significantly greater 
number of guessed (that is, incorrect) names 
were of the same sex as the name which had 
been paired with the figure in the material S 
had attempted to memorize. This is inter- 
preted as evidence of learning without aware- 
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Table 1 
Errors in Recall of Names of Abstract Figures 
(Ss: N = 80) 


Condition 

Names correct (c) vs. names reversed (r) 
Same-sex errors (s) vs. opposite-sex errors (o)* 

All conditions 

Correct condition 

Reversed condition 
Proportions of opposite-sex errors under correct (c) 

vs. reversed (r) conditions 


M No. or Proportion of Errors 


<.01 
<.01 
<.05 


2 <.01 


* A same-sex error is one in which the guessed name of the figure is of the same sex as the name paired with the figure in 
the material S attempted to memorize; an opposite-sex error is one in which the guessed name of the figure is of the opposite 
sex from the name paired with the figure in the material S attempted to memorize. 


ness of what has been learned, i.e., Ss learned 
that figures had the characteristic or charac- 
teristics of maleness or femaleness, as evi- 
denced by the sex of their guessed names for 
the figures, without being able to verbalize 
the reason or principle on which their guesses 
were based. This finding supports the second 
hypothesis. 

Further analysis of same-sex and opposite- 
sex errors reveals that under correct condi- 
tions, the number of same-sex errors (guessed 
name of figure same sex as in memory task) 
was significantly greater than opposite-sex 
errors, at better than the .01 level, whereas 
under reversed conditions, this difference is 
less marked, being significant at better than 
the .05 level, but below the .01 level of con- 
fidence. Translated into proportions of oppo- 
site-sex errors under correct as compared to 
reversed conditions (see Table 1), this differ- 
ence is highly significant. This is interpreted 
as supporting the first hypothesis, that clas- 
sical or Freudian unconscious sexual symbol- 
ism is valid for male college students. Appar- 
ently, when Ss cannot recall the name of a 
figure, there is no interference by classical un- 
conscious symbolism with guessing names of 
figures which are the same sex as the figure 
name in the memory task, if the memory task 
paired girls’ names with female figures and 
boys’ names with male figures. If, however, 
the paired names in the memory task were 
reversed, unconscious sexual symbolism does 
interfere with guessing names of figures which 
are of the same sex as names paired with the 
figures in the memory task, and a significantly 
greater proportion of opposite-sex errors oc- 
curs under reversed conditions. These oppo- 
site-sex errors under reversed conditions are, 


of course, errors in which girls’ names are 
guessed for female figures and boys’ names for 
male figures, in accord with classical sexual 
symbolism, despite the fact that in the mem- 
ory task the sex of the names given to the fig- 
ures had been reversed. 


Discussion 


Evidence for the validity of unconscious 
sexual symbolism was found in the present 
experiment. Apparently, its operation is at a 
deeper level of unconsciousness than that in- 


vestigated by Levy (1954). Its influence on 
responses to sexual symbols was found to be 
relatively weaker than the influence of un- 
conscious learning, or learning without aware- 
ness, which took place during the experiment. 
Thus, when Ss have attempted to learn names 
for figures which are of the opposite sex than 
the classical sexual symbolism of the figure 
and are required to guess for unrecalled names 
of figures, their guessed names are signifi- 
cantly more often of the same sex as those 
they attempted to memorize. At the same 
time, opposite-sex errors intrude significantly 
more often under these conditions than when 
paired names in the memory task are of the 
same sex as the classical sexual symbolism of 
the figure. 

That unconscious learning during the ex- 
periment was a stronger determiner of the 
sex of guessed names of abstract figures in 
the experiment than was classical unconscious 
sexual symbolism may be due to several fac- 
tors. Learning during the experiment, although 
Ss were unaware of or unable to verbalize 
what had been learned, had the advantage of 
greater recency and probably of greater mo- 
tivation (Miller, 1940), in the sense that Ss 
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consciously wished to do well, which would 
be an indication to themselves of their own 
ability or intelligence. Furthermore, classical 
sexual symbolism has the disadvantage of re- 
pression, at least during ordinary waking con- 
ditions, whether repression is regarded in clas- 
sical psychoanalytic terms (Miller, 1942, pp. 
234-239) or simply as avoidance condition- 
ing (Eriksen & Kuethe, 1956). While it is 
certainly impossible to say how deeply un- 
conscious classical sexual symbolism is, it 
seems safe to assume that in the hierarchi- 
cal continuum of imperceptible gradations 
between consciousness and unconsciousness 
(Miller, 1942, pp. 136-137), it lies further 
below the limen of awareness than the learn- 
ing without awareness which occurred during 
the experiment, especially as only the latter 
reached awareness, or became communicable, 
for the four Ss who were eliminated from the 
experiment. 

This interpretation of the present learning 
data agrees in principle with studies of dis- 
crimination without awareness, such as that 
of Miller (1933). These studies nearly all 
show that “. . . the reliability of the sub- 
jects’ judgments increases directly with the 
intensity of the stimuli. If a valid extrapola- 
tion can be drawn from this finding, it would 
be that accuracy of perception increases as 
the stimulation approaches a supraliminal 
level” (McConnell, Cutler, & McNeil, 1958, 
p. 231). Apparently, unconscious learning 
during the experiment more nearly approached 
a supraliminal level of awareness than did 
classical unconscious sexual symbolism, which 
accounts, at least in part, for the greater in- 
fluence of the former on responses to abstract 
sexually symbolic figures in the present ex- 
periment. This suggests that responses to pro- 
jective materials which are projections of un- 
conscious needs and desires may ‘be more de- 
pendent on recent learning, or learning which 
is near the S’s limen of awareness, than on 
more deeply unconscious or repressed proc- 
esses. The present study does not, of course, 
furnish evidence that this is necessarily true. 


Summary 


An experiment was done to determine 
whether classical sexual symbolism is valid for 
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male college students. No evidence was found 
that memory for names of abstract figures 
was better when boys’ names were paired 
with “male” (elongated or pointed) ab- 
stract figures and girls’ names with “female” 
(rounded or containing) abstract figures than 
when the names were paired with the figures 
in a reverse manner. However, evidence in 
support of the validity of classical sexual sym- 
bolism was found in the nature of the errors 
or guesses of names for figures which Ss could 
not recall. Even stronger evidence was found 
of the influence of learning during the experi- 
ment without awareness of what was being 
learned. Possible reasons were discussed for 
the relatively greater influence of unconscious 
learning during the experiment than of un- 
conscious classical sexual symbolism on re- 
sponses during the experiment to abstract 
sexually symbolic figures. 


Received July 7, 1958. 
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THE SOCIAL DESIRABILITY OF TRAIT DESCRIPTIVE 
TERMS: 


APPLICATIONS TO A SELF-CONCEPT INVENTORY ' 


EMORY L. COWEN PHOEBUS N. TONGAS 


University of Rochester 


One consequence of recent research on the 
formal attributes of verbal response behavior 
is the growing conviction, by those concerned 
with problems of assessment, that subjects’ 
(Ss’) responses to personality inventories may 
be influenced by factors quite extraneous to 
item content. Illustratively, Edwards (1953) 
reports a correlation of .87 between the prob- 
ability of endorsement of a personality de- 
scriptive statement and its social desirability 
(SD). Despite such empirical findings, the 
number of available personality inventories 
seems to proliferate rapidly. This, in part, 
may reflect a seeming simplicity in putting 
together a series of verbal propositions osten- 
sibly tapping almost any type of psychologi- 
cal variable. Considerably more difficult is 
the task of demonstrating that the variable 
in question is actually being measured by the 
verbal responses given to the items purport- 
ing to measure it. 

With the development of “modern phe- 
nomenology” (Rogers, 1951; Snygg & Combs, 
1949) there has followed the inevitable rash 
of new instruments (Berger, 1952; Bills, 
Vance, & McLean, 1951; Brownfain, 1952; 
Fey, 1954; Phillips, 1951) presuming to as- 
sess variables of prime import within this 
frame of reference. Typical of this group of 
new measures is the Bills Index of Adjust- 
ment and Values (Bills, 1958). This test con- 
sists of a series of 49 trait descriptive adjec- 
tives, each of which is rated three times—for 
self concept, self-acceptance, and ideal seli— 
on a series of five-point scales. The assump- 


1 Portions of this paper were presented in an ad- 
dress to Div. 8, at the APA meetings in Washington, 
D. C., August 1958. 

* Now at the University of Buffalo. 


tion is made, based on the underlying theory, © 
that the higher the self concept (SC), self- 
acceptance (SA), and ideal self (IS), the 
better adjusted is the S. A fourth score, the 
maladjustment index, is derived from the 
summed discrepancy (D score), without re- 
spect to sign, between SC and IS. Since a 
discrepancy between SC and IS is taken as 
an index of conflict or emotionality with re- 
spect to the trait in question, the lower the 
summed D score the better adjusted is the S 
considered to be. 

It might justifiably be expected that one 
aspect of the validation problem for an in- 
strument of this type would be to demonstrate 
that Ss with high SC, SA, and IS scores or 
low D scores are, in fact, better adjusted than 
their opposites on some independent measure 
of adjustment. Although there has been a 
reasonable amount of “validation research” 
reported for the IAV, the preceding expecta- 
tion has not always been borne out. Illustra- 
tively it is reported (Bills et al., 1951) that 
scores above the mean on SA relate in one 
instance to improved adjustment following 
participation in a student-centered teaching 
situation (as might be expected from theory) 
and in another instance to a higher incidence 
of psychotic signs on the Rorschach (quite 
contrary to theoretical expectations). Subse- 
quently in a larger Rorschach study (Bills. 
1953a), a series of 32 hypotheses, dealing 
with specific test signs which might be ex- 
pected to differentiate extreme high from ex- 
treme low SA scores, was proposed. Paren- 
thetically, 11 of these 32 hypotheses called 
for no differences between groups, thus hav- 
ing a .95 probability of confirmation. Of the 
remaining 21 hypotheses, 10 yielded group 
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differences significant at .05 or beyond. Closer 
examination of these “confirmed predictions,” 
however, indicates that in most instances the 
high SA scorers (theoretically, the better ad- 
justed group) are significantly more malad- 
justed in terms of the traditional meanings 
attached to the specific Rorschach signs in 
question (e.g., less FC, and more CF, lower 
F+%, higher dominance of C over M, etc.). 
While it may be appropriate to conclude that 
the high and low SA scorers are differentiable 
with respect to certain important per- 
sonality characteristics as measured by the 
Rorschach” (Bills, 1953a, p. 38), such an 
empirical demonstration should not be con- 
fused with the validation of underlying theo- 
retical constructs. By and large the findings 
are opposite in direction to what might be de- 
duced from theory, as well as to other em- 
pirical data (LaFon, 1954). 

In still another study (Bills, 1953b), the 
prediction was made that words for which 
lower IAV ratings were given on a second test 
administration would also have slower reac- 
tion times in a repeat word-association test. 
Such a prediction is based on the theoretically 
defensible notion that both lowered ratings 
and greater latency of association may be 
symptomatic of conflict or emotionality. The 
results of the experiment, while statistically 
significant, were directionally opposite to the 
prediction. The following statement is ad- 
vanced in seeking to explain these unantici- 
pated findings: 


. . . the ability to lower a trait rating on re-test may 
be indicative of a lower degree of defensiveness, and 
it would be predicted that such a reaction would be 
accompanied by a decrease in emotionality and, 
therefore by a decrease in reaction time to a trait in 
free association (Bills, 1953b, p. 137). 


If a lowered SC rating on an IAV retest 
can be construed as an index of decreased de- 
fensiveness, it may be equally defensible to 
argue that an initially low self rating could 
be due to lowered defensiveness, rather than 
to conflict or emotionality, as has been as- 
sumed. Unless procedures can be specified be- 
fore the fact, by which we can discriminate 
the high SC score representing good adjust- 
ment from the high SC score representing de- 
fensiveness, we are operating within a closed 


system in which the results of a given experi- 
ment, irrespective of their direction, can be 
interpreted as confirming the underlying the- 
ory. Hence, Bills’ (1953b) conclusion, on the 
basis of the experiment cited, that “. . . the 
index is a valid measure of emotionality and 
changes in emotionality” (p. 137) appears to 
be somewhat premature. When an instrument 
yields significant findings opposite in direc- 
tion to what might be predicted on the basis 
of its underlying theory, it may well be dis- 
criminatory, but it is not valid in the sense of 
offering support for the relevant theoretical 
concepts. 

Perhaps the problem would be simplified 
if, in some consistent way, obtained findings 
were always contrary to what may be pre- 
dicted beforehand. But this does not appear 
to be the case since studies are cited (Bills, 
1958) in which high SC scores show up as 
better adjusted on other measures (e.g., ab- 
sence of psychosomatic symptoms). 

Thus far, most attempts to account for 
findings which do not support underlying the- 
ory in this area (Bills, 1953b; Fey, 1954) 
have been based upon implications about the 
vicissitudes of verbal response characteristics 
in human Ss. The maladjusted S, it is argued, 
appears well adjusted because of his defen- 
siveness. We may presume that part of the 
operational meaning of the concept defensive- 
ness, when it is invoked in this context, is the 
tendency to respond in a socially desirable 
manner in instances where a true response 
would place the S in an unfavorable light. As 
yet, the extent to which social desirability 
stereotypes may account for responses to an 
inventory such as the IAV, either for specific 
individuals or for a group of Ss, is undeter- 
mined. It is precisely this type of informa- 
tion which the present research seeks to es- 
tabiish. 

Theoretical inconsistencies aside, the brunt 
of the reported validity data for the IAV 
(Bills, 1958) is in terms of demonstrated re- 
lationships between it and other similarly con- 
structed self-concept measures (e.g., Cowen: 
1954, 1956; Omwake, 1954) and/or more tra- 
ditional paper-and-pencil inventories (Bills, 
1958). Such correspondences may, however, 
indicate little more than constancy in verbal 
behavior or social desirability stereotypes, and 
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should not per se be taken as a final criterion 
of validation. 

Quite recently, there have been several em- 
pirical and theoretical papers directed spe- 
cifically toward the problem of possible re- 
lationships between social desirability and 
endorsement either of IAV type trait descrip- 
tions (Kenny, 1956) or closely related self- 
descriptive statements from Q-sort pools (Ed- 
wards: 1955, 1957; Edwards & Horst, 1953; 
Kenny, 1956). Most relevant is Kenny’s 
study (1956) in which SD rankings were 
established for 25 trait descriptive adjectives 
for a college sample. A second independent 
sample subsequently rated themselves on these 
same adjectives both for “real self” and “ideal 
self.” Correlations of .81 and .82 were re- 
ported between mean endorsement and SD 
for these two variables. These data suggest 
strongly the possibility that a structurally 
similar formal test instrument such as the 
IAV might be highly saturated with SD. In 
essence, the purpose of the present study was 
to determine the extent to which this was the 
case. Such an inquiry was made possible by 
an earlier study in the present series (Cowen, 
in press) in which mean SD values for a 
sizable pool of trait descriptive terms, includ- 
ing all 49 adjectives comprising the IAV, were 
established for a college sample. 

Our hypothesis may be summarized as fol- 
lows: 

First, there will be a high positive correla- 
tion between both mean SC and mean IS en- 
dorsement and SD. Secondarily, we antici- 
pated an inverse relationship between mean 
SC-IS discrepancy and departure of the mean 
SD value from the scale midpoint (i.e., the 
more extreme the SD of an item, the more 
likely are both SC and IS ratings to be af- 
fected by it; hence, the smaller the SC-IS 
discrepancy) (Kenny, 1956). 


Procedure 


A sample of 100 college Ss (59 male and 
41 female), independent of the SD norma- 
tive group (Cowen, in press), completed the 
standard IAV in group administration. For 
each IAV adjective mean, SC, IS, and SC-IS 
discrepancy scores were determined. The mean 
SC and IS endorsement values were correlated 


Social Desirability of Terms in Self-concepi Inventory 363 


with SD normative values for each trait-de- 
scriptive term. Mean D scores for each adjec- 
tive were correlated with the departure of the 
SD mean for the midpoint, four, of the SD 
scale. 

For reasons to be noted below it seemed de- 
sirable to run a second, cross-validating, sam- 
ple on a somewhat modified version of the 
IAV. These latter data were collected and 
analyzed exactly as the original data had 
been. The new sample consisted of 23 male 
and 26 female Ss. 


Results and Discussion 


The basic hypotheses. All analyses were 
run separately for males and females. In each 
instance, however, the findings were virtually 
identical for the two sexes; hence, all correla- 
tions are reported in terms of the total groups. 

Looking only at the data from the initial 
study we find, in support of our hypotheses, 
a correlation of .906 (p > .001) between SC 
endorsement and SD. This marked relation- 
ship is, in fact, every bit as high as the reli- 
ability of the former measure (Bills, 1958). 
An even higher correlation of .958 (p > .001) 
is observed between IS and SD. On the other 
hand, the predicted correlation between the 
SC-IS discrepancy and extremity of SD was 
not found (r = — .053). 

Analyzing the actual SD values of the 49 
IAV adjectives, it was apparent that they 
were not normally distributed in terms of the 
seven-point SD scale used in the normative 
study. Forty of the 49 terms had SD values 
ranging from 5.0—6.7, while the remaining 
nine ranged from 1.1—2.2. This grossly atypi- 
cal distribution could theoretically have seri- 
ously affected the observed correlations. In 
the instances of the SC and IS correlations, 
the unusual bipolarity could have operated to 
inflate the relationships spuriously, while the 
clustering of 40 of the 49 terms at the posi- 
tive end of the SD continuum could have had 
an opposite effect. Additionally, since the de- 
viation of the adjectives from the midpoint 
of the SD scale was limited to a range of 
1.0-2.9, the correlation between this variable 
and SC-IS discrepancy also could have been 
restricted artificially. 

The second study was undertaken to rule 
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out the possible attenuating effects of these 
distributional atypicalities. Forty-eight new 
adjectives were selected from the original 
normative study (Cowen, in press) so as to 
cover the full range of the SD continuum 
(1.1-6.6). This, of course, also had the effect 
of increasing the SD deviation scores, the 
range going from .1—2.9. Although the trait- 
descriptive terms used for the second study 
were different from those comprising the 
original IAV, the instructions, form of ad- 
ministration, and methods of analyses were 
the same as the ones used with the original 
sample. 

The results of this second study were essen- 
tially identical to those obtained with the 
initial sample. In this instance, the correla- 
tions between SC and IS endorsement and 
SD were .900 (p> .001) and 978 (p> 
.001), respectively, while the correlation be- 
tween SC-IS discrepancy and extremity of 
SD was —.039. Since these findings occurred 
notwithstanding the changes in item content 
as well as their ordering on the SD con- 
tinuum, it appears that the relationship be- 
tween SC and IS endorsement and item SD 
may generalize considerably beyond the IAV 
test. 

Implications. The very high degree of rela- 
tionship between both SC and IS endorsement 
and SD raises question as to whether the for- 
mer two have meaning independent of the 
latter. Given the known reliability of the self- 
concept measures, as much of the variance as 
can be accounted for is accounted for by item 
SD value. On the basis of these findings, it 
does not appear to be defensible to refer to 
the IAV SC and IS scales as indices of self 
concept and ideal self. These measures are 
so heavily saturated with SD that it would 
be more parsimonious to say, when they are 
found to discriminate empirically, that the ob- 
served discriminations are due to differential 
SD stereotypes rather than to self-conceptual 
variations. If this position is a sound one, 
when the interpretive framework is applied to 
several of the studies given earlier critical re- 
view, one might perhaps be tempted to con- 
clude: (a) that Ss markedly influenced by 
SD stereotypes give more maladjusted Ror- 
schachs (Bills, 1953a) than Ss who do not 
respond in terms of such stereotypes, and (b) 


that for those items where Ss’ response be- 
came more socially desirable on an IAV re- 
test (Bills, 1953a), word association reaction 
response latencies also increase. Although di- 
rect and independent test of these rewritten 
generalizations is clearly preferable to the 
present type of speculation, they may repre- 
sent a more parsimonious position than do the 
original generalizations (Bills: 1953a, 1953b). 

The SC-IS discrepancy score, on the other 
hand, appears to be quite independent of 
the contaminating effects of SD stereotypes. 
This per se, while it does not establish the 
validity of this index as a measure of tension 
or conflict, does, at least indirectly, buttress 
the findings of other studies purporting to 
offer such evidence (Cowen, Heilizer, & Axel- 
rod, 1955; Roberts, 1952). However, despite 
our attempts to maximize the range of SD 
deviation scores in the second study, mean 
SC-IS D scores were limited in range from 
.2-1.4 in both studies. Quite probably this is 
due to the fact that the IAV measure, as now 
constituted, is only a five point scale. The 
narrow range of D scores observed would 
tend to have a limiting effect upon the magni- 
tude of the correlation coefficient. Perhaps by 
broadening the range of the IAV rating scales, 
a somewhat more appropriate test of the sec- 
ond hypothesis could be provided. 


Summary 


Broadly stated, the purpose of the present 
study was to assess the degree of relationship 
between endorsement of items measuring as- 
pects of self regarding attitude and item so- 
cial desirability. 

One hundred Ss completed the Bills Index 
of Adjustment and Values. Mean self concept, 
ideal self, and SC-IS discrepancies were com- 
puted for the 49 constituent trait-descriptive 
terms. The first two of these measures were 
found, as predicted, to correlate very highly 
with independently established SD values for 
these same items. Contrary to prediction, SC- 
FS discrepancy failed to correlate with devia- 
tion of the SD mean from the scale midpoint. 
Since the IAV adjectives were found to be 
both skewed and bipolar with respect to their 
SD values, a new series of 48 adjectives 
spread out equally across the full range of 
the SD continuum was selected and given to 
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a second sample of 49 new Ss. Using constant 
methods of test administration and analyses, 
the results obtained with the second sample 
were identical to those of the original study. 

We have concluded that the self concept 
and ideal self measures of the IAV are so 
heavily saturated with SD as to lose mean- 
ing independent of the latter variable. Some 
implications of this basic generalization have 
been considered. 


Received July 18, 1958. 
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TWO QUESTIONS: 
A REPLY TO COWEN AND TONGAS 


ROBERT E. BILLS 


Alabama Polytechnic Institute 


The Cowen and Tongas paper seems to 
have made a significant contribution to un- 
derstanding a portion of the basis on which 
self ratings on the Index of Adjustment and 
Values (IAV) are made. For perspective, 
though, two questions should be asked. 


What Is the Index Used For? 


The first question is: What is the index 
used for? 

In the writer’s opinion, the emphasis of the 
Cowen and Tongas paper does not reflect 
properly what has gone before in the way of 
research concerns and manner of use of the 
index. Most of the work on the index has 
centered around the “acceptance” scores. 

The primary reason for designing the IAV 
(Bills, Vance, & McLean, 1951) was to de- 
velop a measure of self-acceptance. To obtain 
this measure, subjects (Ss) are asked to an- 
swer three questions for themselves for each 
of 49 trait words. Ss tell how they see them- 
selves in respect to the trait, how they feel 
about being this sort of person, and how they 
would like to be in respect to the trait. The 
second of these ratings, cumulated for the 49 
trait words, has been called acceptance of self. 
The cumulated difference between the first 
and third ratings is called the discrepancy 
score. Discrepancy scores are correlated —.77 
(for an N of 175) or —.67 (for an N of 300) 
with acceptance of self (Bills et al., 1951; 
Bills, 1958). The distributions of discrepancy 
scores and acceptance of self scores are ap- 
proximately normal (Bills, 1958, pp. 18-19, 
22-23, 36-37, 40-41). 

For several reasons, including one raised by 
Cowen and Tongas, we later developed an 
“others” form of the index (Bills: 1953, 


1958) in addition to the “self” form described 
above. One of the reasons for this was the 
dilemma posed by high self-acceptance scores 
being indicators of either high self-acceptance 
or of defensiveness—a point which Cowen and 
Tongas also make. To complete the “others” 
form of the index, an S thinks of his peer 
group and completes the index as he believes 
the average member of this group would com- 
plete it for himself. This procedure seems to 
have resolved the dilemma (Bills: 1953, 
1958). A review of the literature will show 
the reader that the validation work for the 
index has centered around the “acceptance” 
scores from these two forms of the index. In 
practice, these are the scores commonly used. 

This being the case, it does not seem neces- 
sary to determine if assumptions one and 
three as stated in the second paragraph of the 
Cowen and Tongas paper have been made by 
this writer. Nor does it seem too important, 
in the IAV context, to examine the lack of 
normality of distribution of the social desir- 
ability of the index traits. And, we will leave 
it to the interested reader to determine for 
himself if “the brunt of the reported validity 
data for the IAV is in terms of demonstrated 
relationships between it and other similarily 
constructed self-concept measures and/or more 
traditional paper-and-pencil inventories.” 

Study and use of the IAV have been con- 
cerned primarily with the “acceptance” scores 
from the “self” and “others” forms. Although 
Cowen and Tongas had data on the self-ac- 
ceptance scores, they do not report using it. 
They did use discrepancy scores which are 
significantly correlated with self-acceptance, 
and it is important to note that they did not 
find discrepancy and social desirability sig- 
nificantly correlated. 
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A Reply to Cowen and Tongas 


What Does the Cowen and Tongas Method 
of Statistical Analysis Yield? 


The second question is this: What does 
the Cowen and Tongas method of statistical 
analysis yield? 

In the writer's opinion, the Cowen and 
Tongas method is not directly comparable to 
methods previously used to report IAV re- 
search. 

The Cowen and Tongas technique for cal- 
culating correlations between self concept and 
social desirability and ideal self and social 
desirability says, in effect, that the mean of 
the ratings given by Ss for each of the 49 
IAV trait words is highly correlated with 
mean social desirability ratings of the traits. 
This technique is significantly different from 
that used to calculate the reported reliabili- 
ties and the validity of the index. The Cowen 
and Tongas technique, in effect, ignores the 
individual variation given by raters. It says 
that the average ratings for each of the trait 
words are highly correlated. with the average 
social desirabilities of these traits. This seems 
to be saying that the average person rates 
himself as average in social desirability for 
each of the trait words. 

The Cowen and Tongas method assumes 
(since only the mean of ratings on trait words 
are used) that the data can be handled as if 
all Ss gave the average rating on a trait word. 
This has the same effect as saying that all Ss 
gave the same rating to a trait word as all 
other Ss. This is obviously a questionable as- 
sumption when information given in the index 
Manual (Bills, 1958) on distribution of rat- 
ings for each of the trait words is examined. 
These data show that considerable variation 
exists in the ratings for each trait and that 
the placing of one score (in this case an av- 
erage rating) on each trait obscures all indi- 
vidual variation, and this is considerable. 

But having calculated the correlation be- 
tween self concept and social desirability and 
ideal self and social desirability by the above 
method, the authors switch to another basis 
for their implication. They compare the above 
correlations with split-half and test-retest re- 
liabilities obtained from column scores. The 
reliabilities reported for the IAV were cal- 
culated from odd-even scores or column to- 
tals in test-retest situations. In other words, 
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Cowen and Tongas arrived at an average rat- 
ing for each trait word on the index and an 
average rating of the social desirability of 
each trait word, paired these, and then pro- 
ceeded to correlate. They then compared this 
coefficient with that obtained by totaling col- 
umns for each individual and pairing column 
scores from split halves or from test-retest. 

Obviously, these two methods are not di- 
rectly comparable and the implication cannot 
be drawn that “Given the known reliability of 
the self-concept measures, as much of the 
variance as can be accounted for is accounted 
for by item SD value.” The two techniques 
yield different results as will be shown. If 
their’s is a valid procedure, it is difficult to 
see how self concept and ideal self, which are 
correlated only .55 with each other for an V 
of 300 cases (Bills, 1958, p. 62), could be 
correlated .91, .96, .90, and .98 with a third 
variable, social desirability. 

Using the formula provided by McNemar 
(1949, p. 142) and designating the r between 
self concept and social desirability as r;, be- 
tween ideal self and social desirability as ro, 
and between self concept and ideal self as rs 
and using .90 for r; and .98 for re, we can 
calculate the limiting values for r,; (that the 
results would be the same if we used .91 and 
.96 instead of .90 and .98 can be checked by 
the reader). In this case, the limits range 
from .78 to .98. But r; for an N of 300 cases 
as has already been stated is .55. Even with 
as few cases as 49 (the smaller of the groups 
used by Cowen and Tongas) an r of .55 based 
on 300 cases is significantly smaller than an r 
of .78 based on 49 cases at less than the .01 
level of confidence. 


Received July 18, 1958. 
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The present paper represents an attempt to 
explore the “stimulus value” of Rorschach’s 
ten inkblots. It is concerned with the global 
“meaning” which is attributed to them by a 
group of young normal adults. 

Over-all meaning and symbolism of some 
of the inkblots has been occasionally ascribed 
intuitively by Rorschach workers. We have 
undertaken an experimental study of the 
meaning of these stimuli which is operation- 
ally defined by the method called the semantic 
differential (Osgood, Suci, & Tannenbaum, 
1957). The rationale for this instrument has 
been developed and discussed by Osgood 
(1952) and need not be detailed in the pres- 
ent report. 

Procedure 


Osgood’s semantic differential consists of 
50 scales which are defined by polar adjec- 
tives, ie.—adjectives which are opposite in 
meaning to each other. These poles are sepa- 
rated by seven spaces, one of which may be 
checked in order to indicate the position of 
the concept or stimulus rated in relation to 
the opposite adjective. An example of such a 
scale is as follows: 


Factor analytic studies of the scales with 
sizable samples of subjects (Ss) rating a va- 
riety of concepts have yielded three major 
factors: 

I. Evaluative; Potency; and III. Ac- 
tivity. 

For the purposes of our investigation, 20 
scales of the semantic differential were se- 
lected by the author. Two criteria governed 
this selection—relatively high loading on the 
aforementioned factors and suitability to the 
material to be judged. The 20 pairs of adjec- 
tives employed are listed in Table 1. Factor 
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I (Evaluative) is represented by Items 1, 3, 
5, 6, 7, 8, 9, 10, 13, and 16; Factor II (Po- 
tency), by Items 2, 4, 12, 15, 18, and 20; 
and Factor III (Activity), by Items 11, 14, 
17, and 19. 

The 20-item semantic differential was ad- 
ministered to 66 (28 males and 38 females) 
undergraduate students in elementary psy- 
chology. The 10 Rorschach cards (by means 
of slides) were projected on a screen one at 
a time. The Ss were given three minutes’ 
time to check the items on the semantic dif- 
ferential. Previous tryouts indicated that the 
time was quite ample even for the slowest Ss. 
Standard instructions (Osgood et al., 1957) 
were given. Thus, a total of 200 check marks 
(10 stimuli x 20 scales) were obtained for 
each participant in the experiment. 


Statistical Treatment of Data 


Although each response (check mark) on the 
semantic differential has two essential proper- 
ties—direction and distance—along the polar 
continuum (or quality and intensity of mean- 
ing), our main concern for the purposes of 
this exploratory study was that of direction 
of meaning. Consequently, all the checks, 
from 1 to 7, were dichotomized, thus using 
direction only. The number of checks on 
Space 4, the midpoint of the continuum, was 
equally divided between the two dichoto- 
mized groups of Ss, i.e., those checking Spaces 
1-3 and those checking 5—7. By doing that, 
the data were prejudiced to some extent in 
favor of the null hypothesis that there are no 
differences in the directions of the scale rat- 
ings of the Rorschach cards. 

The significance of the differences between’ 
the number of Ss checking each direction :(the 
dichotomized data for each scale) was then 
calculated by means of the chi square tech- 
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nique. Three sets of data were obtained—for 
the total sample, for the male group, and for 
the female group separately. Each set of data 
represents 200 chi square values (10 x 20). 
A total of 600 chi square figures was thus ob- 
tained.’ 

Results 


Of the 200 chi square values for the total 
sample, 112 were found to be significant at 
the .05 level to beyond the .001 level. The 
presentation of the levels of significance for 
all scales and Rorschach cards appears in 
Table 1. 

Using the direction of the significance of 
differences, Rorschach’s inkblots may be de- 
scribed by means of the following polar adjec- 
tives: 


Card I: large, strong, ugly, dirty, distasteful, cruel, 
unpleasant, ferocious, heavy, cold, thick, active, 
rough, and rugged 

Card II: good, strong, clean, kind, pleasant, happy, 
peaceful, hazy, and honest 

Card III: good, beautiful, clean, tasty, kind, pleas- 
ant, happy, peaceful, light, clear, thin, honest, active, 
smooth, fast, and delicate 


1 The assistance of James C. Lingoes in processing 
the data through the Michigan State University 
digital computer (MISTIC) is hereby gratefully ac- 
knowledged. 
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Card IV: bad, large, ugly, strong, dirty, distaste- 
ful, worthless, cruel, unpleasant, sad, ferocious, heavy, 
thick, dishonest, active, rough, and rugged 

Card V: small, light, clear, thin, active, smooth, 
fast, and delicate 

Card VI: large 

Card VII: good, beautiful, weak, clean, tasty, 
valuable, kind, happy, pleasant, peaceful, light, clear, 
thin, honest, passive, smooth, and delicate 

Card VIII: good, clean, pleasant, happy, peaceful, 
and hazy 

Card 1X: good, large, beautiful, strong, clean, 
tasty, valuable, kind, pleasant, happy, hot, honest, 
active, and fast 

Card X: good, beautiful, clean, pleasant, happy 
light, thin, active, fast, and delicate 


It is interesting to note that not all cards 
show clear-cut agreement in the characteriza- 
tions based on the directions in the polar con- 
tinuum. The meaning of some of the cards 
such as IV, VII, III, I, and IX, is delineated 
in considerable detail, whereas in some of the 
others, comparatively few significant differ- 
entiations are made. This is particularly noted 
on Card VI where only one significant differ- 
ence was obtained. A summary of the num- 
bers of significant differences for each card. 
including a breakdown of the findings with 
the male and female groups separately, ap- 
pears in Table 2. 


Table 1 


Level of Significance (Based on Chi Square) of the Differences in the Incidence of Subjects at 
Each Polar End in Judging Rorschach's Cards 


IV V VI VII iX X 


good 
large 

. beautiful 
. Strong 

. clean 
tasty 
valuable 
kind 

. pleasant 
happy 

. ferocious 
. heavy 

. Clear 

. hot 

. thick 

. honest 

. active 

. rough 

. fast 

. rugged 


* Significant at the .05 level. 
** Significant between the .01 and .002 level 
*** Significant beyond the .001 level of confidence 


VI VII Vill IX 
bad 
small 
ugly 
weak 
dirty 
distasteful 
worthless 
cruel 
unpleasant 
sad 
peaceful 
light 
hazy 
cold 

thin 
dishonest 
passive 
smooth 
slow 
delicate 


J 
lea 
4 
I x 
11 
12 eee 
13 
14 
1s 
id 
17 * 
is 
1s 
2 
~ 
y 


N I II 
Males 28 6 4 10 
Females 38 11 7 16 
Combined 66 14 9 16 


It may further be noted, in Table 2, that 
the number of significant chi square values 
for the individual sex groups is smaller than 
for the entire sample. This is readily under- 
stood, since the numbers are smaller and a 
smaller number of chi squares reaches the de- 
sired level of significance. Also, the number 
of significant differences for the male group 
is smaller than for the female group. This dif- 
ference is also explained by the smaller num- 
ber of males when compared with the females. 
However, what is of particular interest is that 
no marked sex differences in the direction 
along the polar continuum of the scales were 
noted. The males and females showed differ- 
ences in the same direction. In no instance 
were there significant differences for males 
and females in the opposite direction. They 
never cancelled each other out when com- 
bined in the total sample. 

Differences in the meanings of the inkblots 
are further noted in the direction and in the 
extent to which the three main factors were 
represented in the evaluation of the cards. By 
scoring the midpoint of each scale as 0 and 
the remaining points from +3 to —3, we ob- 
tained average factor scores for each card. 
The plus signs indicate relatively high Evalua- 
tion, Potency, and Activity; the minus signs, 
the opposite (Table 3). 


Discussion 


In the first place, although each card may 
have a unique meaning for each individual in 


Factors 


I (Evaluative) 
II (Potency) 
IIL (Activity) 
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Table 2 


Number of Significant Chi Squares for Each Card for Males, Females, and Combined Sample 


Table 3 


Factor Scores for the Ten Rorschach Cards 


Vill IX X Total 
14 7 1 11 2 8 4 67 
17 4 0 12 2 11 6 86 
17 8 1 17 6 14 10 112 


the group, there are considerable areas of 
agreement indicating that the meanings are 
shared by large numbers of people; i.e., there 
is a commonality of the meaning reflected in 
the group results. This finding is consonant 
with Richards’ (1958) observation concern- 
ing the individuality of the Rorschach figures. 
The possible exception may be Card VI on 
which there is little agreement (one scale out 
of 20) as to its meaning. To be sure, there is 
considerable overlapping in the adjectival de- 
scription of some pairs of cards (such as II 
and VIII, III and VII); however, in no in- 
stance is the resulting profile of any one card 
exactly like any other in the series. Secondly, 
the factors into which the scales converge 
tend to underline further the differences in 
the meanings; the Evaluative, Potency, and 
Activity factors have different directions and 
strengths in the meanings of the several cards. 
Thirdly, the meanings of the cards seem to be 
the same for males and females; no signifi- 
cant sex differences with respect to the reac- 
tions to the cards were obtained. 

These findings must further be related to 
the available data and hypotheses concerning 
the Rorschach test. One may address oneself 
primarily to two major issues which are in- 
terrelated—‘how do our findings relate to the 
symbolic meaning attributed to the several 
cards by Rorschach workers?” and “what in- 
formation is gleaned concerning the reaction 
of subjects to the color cards?”—an important 
theoretical issue in Rorschach theory. 
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I Ill IV VI VII VIII IX x 
—58 +39 +103 98 +.25 +05 +4109 +.21 +.74 

+ 90 +.03 — 81 +1.69 — .65 —.23 — +.02 +.51 — 41 

+28 —-17 + .54 +55 -17 25 +.76 +.67 4 


“Meaning” of Rorschach 


Clinicians have especially attributed to 
three blots of the Rorschach series particular 
symbolic characteristics. Card IV has been 
considered as the “father” card, Card VI— 
“sex,” and Card VII, the “mother” card. The 
symbolism of these cards has also received 
some support from several experimental stud- 
ies (Hirschstein & Rabin, 1955; Meer & 
Singer, 1950; Rosen, 1951). An examination 
of the meaning of these cards via the se- 
mantic differential is, therefore, of particular 
interest. With reference to Card IV, Bochner 
and Halpern (1945) stated: “. . . this card 
embodies something sinister . . . may sug- 
gest the father or authority in general.” Al- 
though, according to the same authors, Card 
VI “. . . is weighted for sexual implications 

.. it is generally .. . the most difficult 
card to interpret,” whereas Card VII is con- 
sidered to have “. . . a feminine quality, fre- 
quently with maternal implications.” 

If we were to examine the adjectival de- 
scriptions of the cards, presented above and 
based on the semantic differential, we would 
note some validation of these hypotheses 
used by Rorschach workers and represented 
in the descriptions just quoted. The “sinister” 
and possibly some of the masculine qualities 
of the card are well reflected in its description 
as “bad, ugly, strong, dirty, cruel, unpleas- 
ant, ferocious, heavy, thick, dishonest, active, 
rough, and rugged.” The “soft” and “femi- 
nine” quality of Card VII, on the other hand, 
becomes expressed via the opposite polar ad- 
jectives—“good, beautiful, weak, clean, kind, 
pleasant, peaceful, thin, passive, smooth, and 
delicate.” It is also interesting to note that 
on 15 of the semantic differential scales, these 
two cards differ significantly and appear at 
the opposite ends of the polar continuum. 
The obtained profile of the distribution of 
the adjectives for Card IV is almost a per- 
fect mirror image of that of Card VII. This 
is the most outstanding and clear-cut differ- 
ence between any two cards in the series. 
We have, of course, not validated the “father” 
and “mother” symbolism, but masculinity vs. 
femininity, and danger vs. security do seem 
to be represented by these visual stimuli. 
Osgood et al. (1957) have briefly reported 
a study by Smith in which 10 scales were 
utilized, and the cards were compared with 
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the concepts. Smith’s findings did not justify 
the association between the cards and the 
parental figures. The selection of the scales 
may be a determining factor. The detailed 
findings of this study are not yet available 
and, therefore, cannot be related to our re- 
sults. 

Whereas Cards IV and VII are “most 
meaningful” in terms of the semantic differ- 
ential (17 scales—statistically significant for 
each), Card VI seems to be “least meaning- 
ful” for our Ss (only one significant differ- 
ence). It certainly does appear to be “the 
most difficult to interpret.” However, no 
inkling as to the sexual symbolism was ob- 
tained. Maybe the semantic differential scales 
employed do not tap sex-related meaning. At 
any rate, within the limitations of the instru- 
ment used, it must be stated that this card 
does not elicit any agreement as to its mean- 
ing in our Ss. 

The effects of color in Rorschach’s five 
chromatic cards (II, III, VIII, IX, and X) 
and the interpretation of the color response 
have been subjects of considerable contro- 
versy as a result of divergent findings in a 
number of investigations (Ainsworth, 1954). 
The appearance of color in the Rorschach se- 
ries has been considered as somehow “‘disturb- 
ing,” “displeasing,” “startling,” and ‘“shock- 
ing.” The “color shock” phenomenon de- 
scribed originally by Rorschach (1944) was 
allegedly present primarily in neurotics. Others 
(Beck, 1946) considered it a much more wide- 
spread phenomenon. Our findings with a “nor- 
mal” student population do not seem to bear 
out the contention that color is disturbing. 
Card II, the first one in the series in which 
color is present, is by and ‘large seen in posi- 
tive terms. The semantic differential yields 
such adjectives as “good, strong, clean, pleas- 
ant, kind, happy, peaceful, and honest.” In 
view of such descriptive adjectives, it can 
hardly be called “disturbing” or unpleasant. 
The remaining color cards (III, VIII, IX, 
and X) reflect even a greater variety of posi- 
tively toned qualities and meanings. These re- 
sults are consonant with Wallen’s (1948) ex- 
perimental findings that “multi-colored cards 
are better liked in the standard form than 
when they are rendered achromatically.” He 
adds, however, that “unstable men show more 
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preference for the achromatic version.” Per- 
haps our sample was made up mostly of stable 
Ss. It may also be argued that the “shock,” 
frequently measured by delayed response 
time, could not be detected in our experi- 
mental procedure. But, on the other hand, 
there seems to be no adequate reason why 
our Ss would rationalize their responses or 
“cover up” the meaning of, and their reac- 
tion to, the cards. To our knowledge there 
was no special need for the kind of compli- 
ance which operates when responses to so- 
cially and emotionally conditioned stimuli are 
required. If interpreted parsimoniously, the 
results simply indicate that rather positive 
meanings are attributed to the color cards by 
a random sample of college students who are 
presumably largely nonneurotic. 

Some of the issues raised above and im- 
plied in the findings point to the rich oppor- 
tunities presented by the semantic differential 
for the further systematic investigation of the 
Rorschach technique. Relating the results with 
the several symbolic cards to meanings ob- 
tained with verbal stimuli (e.g., father, mother, 
etc.), which allegedly they represent, studying 
differences in the meaning of various cards be- 
tween normals and clinical groups, observing 
the relationship between the meaning of cards 
and the responses produced in the testing 
situation, and many other approaches can 
contribute appreciably to the basic under- 
standing of the Rorschach method. 


Summary 


Sixty-six college students, 28 males and 38 
females, checked 20 items of the semantic 
differential on a seven-point scale for the 10 
Rorschach inkblots. The data were dichoto- 
mized, and the significance of the differences 
in the choice of each polar adjective was de- 
termined by means of the chi square tech- 
nique. For the total group, 112 (out of 200) 
chi square values, significant between the .05 
and beyond the .001 level, were obtained. The 
number of significant chi squares for the males 
and females treated separately were 67 and 
87, respectively. The representation of the 
three factors—Evaluative, Potency, and Ac- 
tivity—in the obtained meaning of each card 
was also reported. The results permit the fol- 
lowing conclusions: 


A. I. Rabin 


1. Despite considerable overlap in the mean- 
ing of some pairs of cards, no card was de- 
scribed in exactly the same terms as any other. 

2. There was considerable range in the 
number of adjectives, for which significant 
differences were obtained, which describe any 
one card. Cards IV and VII are most mean- 
ingful, whereas Card VI has practically no 
commonality of meaning. 

3. Cards IV and VII emerged with almost 
perfectly opposite meanings. 

4. The colored cards were considered by the 
group as pleasing and in, primarily, positive 
terms. 

5. No sex differences in the attribution of 
meaning to the cards were obtained. 

The results were discussed in the light of 
existing Rorschach theory concerning card 
symbolism and the effects of color upon Ror- 
schach performance. Suggestions for the fur- 
ther study of the Rorschach by means of the 


, semantic differential were made. 
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WHAT DOES THE RORSCHACH Z SCORE 
REFLECT? 


LEON S. OTIS 
Johns Hopkins University 


When Beck (1933) introduced Z, he sought 
a measure which would reflect the individu- 
al’s tendency to organize, relate, and abstract 
aspects of the Rorschach cards into meaning- 
ful units. Because the measurement of ab- 
stractive and organizing abilities are, presum- 
ably, included in many of the current intelli- 
gence tests, and because Beck has frequently 
related Z (on an intuitive basis) to high in- 
tellectual ability (Beck: 1933, 1949), many 
authors have assumed that Z and intelligence 
test scores ought to be highly correlated. 

The evidence relating Z to intelligence test 
scores or academic achievement, however, has 
been inconclusive. Wishner (1948) found a 
significant correlation between Z and the 


Wechsler-Bellevue Fullscale and Subscales, 


but negative findings between intelligence 
tests and Z have been reported also (Gold- 
farb, 1945; Jolles, 1947). McCandless (1949) 
failed to find a significant correlation between 
academic achievement and Z. 

The meaning of these contradictory results 
is difficult to assess, as the subjects of these 
studies varied greatly as to their mental health 
(one study used 42 neurotics as its sample), 
their educational background, their age, and 
their mental capacity (the sample in one 
study consisted of 66 feeble-minded children 
under 18 years of age). Also, these groups 
differed markedly from the 39 “intellectually 
very superior” individuals used by Beck 
(1933) to derive the weights assigned to the 
different Z responses. Beck’s sample consisted 
essentially of 32 individuals in the psycho- 
logical, medical, and related fields, 5 ad- 
ministrators, a lawyer, and a statistician. Fi- 
nally, Beck (1949, p. 58) suggested that the 
“organizational activity” reflected by Z is 
only a component of intelligence. Accordingly, 


the expectation that Z ought to correlate 
highly with intelligence test scores or with 
academic achievement appears to be based on 
an overly restrictive definition of intelligence, 
an unwarranted limitation on the sorts of 
abilities which Z, in fact, does reflect, or both. 

If Z reflects traits other than those tapped 
by the usual general intelligence tests, the 
question is raised: What traits are these and 
how are they to be identified? If groups which 
scored high on Z and which differed markedly 
from the general population in other respects 
could be found, the nature of the correlations 
between Z and the traits which are widely 
shared by the “high Z” group (but not shared 
by the “low Z” group) ought to lead to some 
hypothesis regarding what Z reflects. 

The present study is a step in this direc- 
tion. It compares the Z of a group of execu- 
tives with the Z of lower eschelon occupa- 
tional groups. It seems clear that executives 
differ markedly in broad intellectual and other 
personality traits from other employees (see 
Meyer & Pressel, 1954). This study asks 
whether the Z of executives also differs sig- 
nificantly from nonexecutives. 

The sample in this study consisted of 154 
of the 157 records of Beck’s Speigel group.’ 
Three of the records were not available. The 
records of 36 executives and junior executives 
(abbreviated, EXEC), 48 skilled workers (sk), 
45 semiskilled workers (s-sk), and 25 un- 


1 This study was undertaken while the author was 
serving a clinical internship at the University of Illi- 
nois, College of Medicine, Division of Psychiatry 
Thanks are due to Alan Rosenwald, director of psy- 
chological training at the University of Illinois, for 
his help and valuable suggestions. The writer is also 
grateful to S. J. Beck and S. J. Korchin for their 
cooperation in making the records of Beck's Speigel 
population available for study 
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Leon S. Otis 


Table 1 


Chi Square Comparisons of the Differences in Z Scores 
Between Executives and Nonexecutives 


25 
22 


SK 24 
UN-SK 12 


S-SK 23 
UN-SK 12 


skilled workers (UN-SK), were examined. 
Membership to these groups was indicated on 
the protocols. 

Table 1 shows that Z distinguished the 
EXEC group from the other groups but did 
not distinguish between the “nonexecutive”’ 
groups. The median scores of the four groups 
were 27.0 for the ExEc group, 17.0 for the sk 
group, 16.0 for the s-sk, and 15.0 for the uN- 
SK, groups. 

The wide disparity between the Z scores of 
the Exec group and the scores of the other 
three groups is noteworthy. If this finding can 
be taken at face value, it suggests that Z re- 
flects at least some of the characteristics which 
lead to the top level of the managerial hier- 
archy. 


Of particular interest here, however, is the 
fact that Z, a score which was derived on the 
basis of examining the Rorschach records of 
a group of scholars and professional people, 
was successful in distinguishing executives 
from other employees. Although differing in 
many respects (in terms of values, life style, 
goals, etc.) from each other, scholars and 
executives obviously share some important 
traits or characteristics in common. These ap- 
pear to be reflected in Z responses. Perhaps 
Z reflects a rather global trait which might 
be called “giftedness”; or perhaps Z reflects 
ambition, or intellectual curiosity, or drive. 
These are some of the traits which scholars 
and executives might be expected to share. 
This sort of guesswork is completely unsatis- 
factory, however. What is needed is a factor 
analytic study of scholars, executives, and 
nonscholars and nonexecutives with the Ror- 
schach Z score as one of several dimensions 
of comparison. 
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BRIEF REPORTS 


ECOLOGIC FACTORS IN THE WAIS PICTURE 
COMPLETION TEST’ 


BERNARD L. BLOOM 2 
Territorial Hospital, Kaneohe, Hawaii 


The present study was designed to examine the 
impact of ecologic factors on the responses to 
one Wechsler subtest in a situation where pro- 
nounced differences were expected and where di- 
rectional hypotheses could be advanced. Specif- 
ically, the Picture Completion subtest (PC) of 
the Wechsler Adult Intelligence Scale (WAIS) 
was administered to 67 second- and third-year 
student nurses in St. Louis, Missouri, and an 
equal number of student nurses in Hawaii, none 
of whom had ever been to the mainland. In ad- 
dition, all subjects were given the WAIS Vo- 
cabulary subtest in group form in order to assess 
the level of linguistic skill. 

The following hypotheses were tested: 1. Main- 
land student nurses will perform higher than 
Hawaii student nurses on PC Items 11 (stars), 
12 (dog tracks), 13 (Florida), and 20 (snow). 
2. Hawaii student nurses will score significantly 
better than student nurses on the mainland on 
PC Items 8 (peg), 9 (oar lock), 14 (smoke- 
stacks), and 15 (crab’s leg). 

Both groups of student nurses obtained higher 
1An extended report of this study may be ob- 
tained without charge from Bernard L. Bloom, Ter- 
ritorial Hospital, Kaneohe, Hawaii, or for a fee from 
the American Documentation Institute. Order Docu- 
ment No. 5894, remitting $1.75 for microfilm or 
$2.50 for photocopies. 

* The author wishes to express his appreciation to 
Ivan N. Mensh and his assistants Judith A. Hilt- 
brand and Eileen J. Turken, formerly of the Depart- 
ment of Medical Psychology, Washington University 
School of Medicine, who so kindly provided the stu- 
dent nurse data for the mainland group. 


average scores and had less variability on the PC 
and Vocabulary subtests than the standardization 
sample of equivalent age. The mainland group 
obtained higher scores than the Hawaii group on 
both subtests with the difference being at the .01 
level on Vocabulary and at between the .05 and 
.10 levels on PC. The order of presentation on 
the WAIS test is somewhat more appropriate for 
the mainland group than for the Hawaii group. 
The subtest scores correlate .22 with each other 
in the mainland group and .00 in the Hawaii 
group. 

All of the four items hypothesized to be easier 
for the Hawaii student nurse group appear to be 
somewhat easier for this group, and the differ- 
ence on PC Item 15 (crab’s leg) achieves signifi- 
cance at the .02 level. Of the four items hy- 
pothesized to be easier for the mainland group, 
three are easier for this group and the differences 
on PC Items 12 (dog tracks), and 13 (Florida) 
are significant at the .01 level. PC Item 11 (stars) 
is insignificantly easier for the Hawaii group. 
Thus, of the eight items for which directional 
differences were predicted, only one comparison 
failed to lie in this predicted direction. 

In addition to suggesting that the PC subtest 
may be less suitable for use in Hawaii than on 
the mainland, these results tend to show that 
specific characteristics of a subject’s environment 
should be taken into account when evaluating 
test results and that ecologic characteristics may 
have a predictable and demonstrable relationship 
to psychological test performance. 


Brief Report. 
Received January 13, 1959. 
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PERSONALITY IMPLICATIONS OF CIGARETTE SMOKING 
AMONG COLLEGE STUDENTS 


DANIEL S. P. SCHUBERT 


University of Chicago 


Freud considered smoking an adult extension of 
autoerotic sucking in childhood, and Fenichel re- 
lates oral fixation to the manic-depressive cycle. 
The hypothesis is suggested that manic-depressive 
tendencies vary directly with amount of smoking. 
Vallance’s findings (1940) that smokers are less 
suggestible indicates the second hypothesis that 
hysteria varies inversely with amount of smoking. 
Lynn’s (1948) conclusion that nonsmokers have 
better “disciplinary records” at secondary school 
suggests the additional hypothesis that psycho- 
pathic-deviate tendencies vary directly with smok- 
ing. 

The group form of the Minnesota Multiphasic 
Personality Inventory was administered to a col- 
lege group during freshman orientation. The Ma, 
D, Hy, and Pd scores were taken as measures of 
the tendencies indicated in the hypotheses. The 
13 MMPI scale scores (3 validity, 9 clinical, and 
Si) were found to be representative of the north- 
eastern college population from which they were 
drawn. Of the group, 92 men and 134 women 


1 An extended report of this study may be ob- 
tained without charge from Daniel S. P. Schubert, 
Department of Psychology, University of Chicago, 
Chicago 37, Ill., or for a fee from the American 
Documentation Institute. Order Document No. 5895, 
remitting $1.25 for microfilm or $1.25 for photo- 
copies. 

2 Paper read at the meeting of the American Psy- 
chological Association, Washington, D. C., 1958. 

% The author is indebted to Desmond Cartwright 
and Loren Chapman of the University of Chicago 
for their helpful criticism and suggestions. 


were asked if they smoked cigarettes. Of these, 
47 men and 51 women did. This proportion of 
smokers is also representative of the general 
northeastern population for this age group (U. S. 
Department of Commerce, Bureau of Census, 
1957). 

The results were in the predicted direction on 
the Ma, Pd, and Hy scales for both sexes and 
for the females on the D scale. The results for 
males on the D scale were not in the predicted 
direction. Two-tailed ¢ tests showed the signifi- 
cance on the Ma and Pd scales to be beyond the 
.001 level for the men and on the Ma scale be- 
yond the .01 level for the women. The differences 
on the Hy and D scales were not significant. The 
difference on the Pd scale was not significant for 
the women. Similar ¢ tests on other MMPI scale 
means were not significant. There was consider- 
able overlap of smokers’ scores and nonsmokers’ 
scores on all scales of this study. 


Brief Report. 
Received July 31, 1958. 
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“SOCIAL DESIRABILITY” AND “ANXIETY” VARIABLES 
IN THE IPAT ANXIETY SCALE' 


A. W. BENDIG 
University of Pittsburgh 


Bendig and Hountras (1959) have suggested 
that Cattell’s division of the items in the IPAT 
Anxiety Scale (Cattell, 1957) into “covert” and 
“overt” subscales was similar to the “subtle” and 
“obvious” items in the MMPI and that Cattell’s 
“covert” anxiety items should show a lower cor- 
relation with Edwards’ Social Desirability Scale 
(Edwards, 1957) than would the “overt” anxiety 
items. One difficulty in testing this hypothesis is 
that 22 of the 39 items on Edwards’ SDS are 
also scored for Taylor’s Manifest Anxiety Scale 
(Edwards, 1957, p. 32) and could be presumed 
to be more highly loaded with an “anxiety” fac- 
tor than are the remaining 17 items. In the fol- 
lowing research, these two subsets of SDS items 
were separately correlated with scores from Cat- 
tell’s scale. 

The 40-item Cattell and 39-item Edwards’ SDS 
scales were administered to 238 Ss (110 men and 
128 women) enrolled in educational psychologi- 
cal courses. Seven scores were obtained for each 
S: Covert Anxiety (CA), Overt Anxiety (OA), 
Difference Score (DS) (CA minus OA), and To- 
tal Anxiety (TA) (CA plus OA) from Cattell’s 
scale, and SD-MAS, SD-NonMAS, and SD-To- 
tal (SD-MAS plus SD-NonMAS) from Edwards’ 
scale. Correlations among the seven scores were 
separately computed within each sex group and 
then averaged, since the men were significantly 
lower on OA, TA, SD-MAS, and SD-Total. 

The correlation between CA and SD-Total 
(—.56) was significantly lower than the OA and 


1An extended report of this study may be ob- 
tained without charge from A. W. Bendig, Dept. of 
Psychology, University of Pittsburgh, Pittsburgh 13, 
Pa., or for a fee from the American Documentation 
Institute. Order Document No. 5919, remitting $1.25 
for microfilm or $1.25 for photocopies. 


SD-Total correlation (—.70). OA was more highly 
related to SD-MAS (—.76) than it was to SD- 
NonMAS (—.48), while CA scores showed simi- 
lar correlations (—.53 and —.47) with both sub- 
sets of SDS items. The TA score correlated higher 
with SD-MAS (—.71) than it did with SD-Non- 
MAS (—.52). DS correlated significantly with 
SD-MAS (.36) and with SD-Total (.29), but not 
with SD-NonMAS (.10). The correlation between 
CA and OA was .58 and between SD-MAS and 
SD-NonMAS was .59. 

The hypothesized higher correlation of OA and 
SD-Total was confirmed, but is attributable to a 
higher correlation of OA with the SD-MAS items. 
All of the scores, with the exception of DS, seem 
to measure a general “social desirability” vari- 
able which accounts for most of the relationships 
among the scores. However, OA and SD-MAS 
appear also to have another variable in common: 
presumably an “anxiety” component. CA appar- 
ently is not a useful measure of “anxiety” ex- 
cept insofar as it operates within the DS score 
as a suppressor variable for “social desirability.” 


Brief Report. 
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SEX DIFFERENCES IN BODY CONCEPTS ' 


GEORGE CALDEN 


Veterans Administration Hospital, Madison, Wisconsin 


RICHARD M. LUNDY anno RICHARD J. SCHLAFER 


University of Wisconsin 


There is little doubt about the importance of 
bodily and facial attractiveness in American so- 
ciety, yet there has been negligible research in 
this field. The aim of this study was to compare 
the degree of satisfaction or dissatisfaction with 
their body and facial features between males and 
females. In addition, a comparison was made be- 
tween the sexes in regard to the attractiveness or 
unattractiveness of various male body builds. 

A “body concept” questionnaire was given to 
196 female and 110 male students of an inter- 
mediate university psychology course. The form 
consisted of 13 items pertaining to the S’s physi- 
cal features and to the extent of satisfaction each 
S feels toward these features. Also included was 
a scale for rating and describing the attractive- 
ness or unattractiveness of seven male body 
types. The figures, presented on slides, were front 
view photographs selected from Sheldon’s Aflas 
of Men, representing a balanced somatotype and 
the moderate and extreme forms of endomorphy, 
mesomorphy, and ectomorphy. 

The results indicate that males wish to be three 
pounds heavier on the average; females seven 
pounds lighter. All females dissatisfied with their 
weight wish to weigh less. Only half of the dis- 
satisfied males wished to be lighter. Females, 
however, are more satisfied with their height than 
are males. All but two of the dissatisfied males 

1An extended report of this study may be ob- 
tained without charge from George Calden, Psy- 
chology Service, Veterans Administration Hospital, 
Madison, Wisconsin, or for a fee from the American 
Documentation Institute. Order Document No. 5921, 
remitting $1.75 for microfilm or $2.50 for photo- 
copies. 


wished to be taller, whereas one half of the dis- 
satisfied females wished to be shorter. Females 
express less satisfaction with the attractiveness 
of their bodies. Males wish to have wider shoul- 
ders, thicker arms and legs; females prefer 
smaller hips and waists and thinner arms and 
legs. Almost half of the females desired larger 
busts and half of the males desired bigger chests. 

Females also are less satisfied with their facial 
features than are males. Ss of both sexes single 
out the nose as the facial feature they most wish 
to have altered, usually made smaller. Men, how- 
ever, wished to have more prominent chins and 
less prominent ears; females preferring bigger 
eyes, better vision, and a more oval facial shape. 

Ss of both sexes regard the balanced somato- 
type figure as most attractive and the extreme 
endomorph as least attractive of the seven male 
body types. Females, however, viewed the for- 
mer as more attractive and the latter body type 
as more repulsive than did the males. Females 
also were less inclined to view the extreme meso- 
morph (the “muscle man’’) as attractive. All of 
the above differences were significant at the .01 
level. 

In general, females desire changes from the 
waist down and wish for smallness and petite- 
ness of body parts (except for bust). Males are 
dissatisfied with body dimensions from the waist 
up, desiring bigness of body parts. The greater 
dislike among females for extreme endomorphic 
and mesomorphic male physiques may be in part 
a projection of their personal wish for delicate- 
ness in their own bodies. 
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in California, Santa Barbara, Cali- 


fornia. 


| 
| 
| 
Miltem Bruttes, Ph.D. 
William 2, Cohen, Ph.D. 
Jr., M.D. Dorothy B, Conrad, Ph.D. 
Sidney L. Copal, B4.D. 
Michael B, Duan, Ph.D. 
: Shirley M. Jshneon, Ph.D. 
M.D. Johm &, Klelcer, Ph.D. 
Vorrey Levine, Ph.D. 
Richard H. Lambert, M.D. ‘leary Platt, Ph.D. 
Peters, M.D. \-corge Spivack, Ph.D. 
Shersen, M.D. Herbert A. Sprigle, Ph.D. 
Tersian, M.D. Anne Howe, 
Uhier, M.D. Kenneth B. Evans, B.S. 
right, Heury MD. 
orth Keary, 8.T.D. 
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